This is a followup to my first post, where I compared different MQTT broker implementations written in D, C, Erlang and Go. Then my colleague who wrote the Erlang version decided to write a Java version too, and I felt compelled to do a C+11 implementation. This was only supposed to simply add the results of those two to the benchmarks but unfortunately had problems with the C++ version, which led to the title of this blog post. More on that later. Suffice it to say for now that the C++ results should be taken with a large lump of salt. Results:
loadtest (throughput - bigger is better) Connections: 500 750 1k D + vibe.d: 166.9 +/- 1.5 171.1 +/- 3.3 167.9 +/- 1.3 C (Mosquitto): 122.4 +/- 0.4 95.2 +/- 1.3 74.7 +/- 0.4 Erlang: 124.2 +/- 5.9 117.6 +/- 4.6 117.7 +/- 3.2 Go: 100.1 +/- 0.1 99.3 +/- 0.2 98.8 +/- 0.3 Java: 105.1 +/- 0.5 105.8 +/- 0.3 105.8 +/- 0.5 C++11 + boost::asio: 109.6 +/- 2.0 107.8 +/- 1.1 108.5 +/- 2.6 pingtest (throughput constrained by latency - bigger is better) parameters: 400p 20w 200p 200w 100p 400w D + vibe.d: 50.9 +/- 0.3 38.3 +/- 0.2 20.1 +/- 0.1 C (Mosquitto): 65.4 +/- 4.4 45.2 +/- 0.2 20.0 +/- 0.0 Erlang: 49.1 +/- 0.8 30.9 +/- 0.3 15.6 +/- 0.1 Go: 45.2 +/- 0.2 27.5 +/- 0.1 16.0 +/- 0.1 Java: 63.9 +/- 0.8 45.7 +/- 0.9 23.9 +/- 0.5 C++11 + boost::asio: 50.8 +/- 0.9 44.2 +/- 0.2 21.5 +/- 0.4
In loadtest the C++ and Java implementations turned out to be in the middle of the pack with comparable performance between the two. Both of them are slightly worse than Erlang and D is still a good distance ahead. In pingtest it gets more interesting: Java mostly matches the previous winner (the C version) and beats it in the last benchmark, so it’s now the clear winner. The C++ version matches both of those in the middle benchmark, does well in the last one but only performs as well as the D version in the first one. A win for Java.
Now about my C++ woes: I brought it on myself a little bit, but the way I approached it was by trying to minimise the amount of work I had to do. After all, writing C++ takes a long while at the best of times so I went and ported it from my D version by translating it by hand. I gleaned a few insights from doing so:
- Using C++11 made my life a lot easier since it closes the gap with D considerably. const and immutable became const auto, auto remained the same except when used as a return value, etc.
- Having also written both C++ and D versions of the serialisation libraries I used as well as the unit-testing ones made things a lot easier, since I used the same concepts and names.
- I’m glad I took the time to port the unit tests as well. I ended up introducing several bugs in the manual translation.
- A lot of those bugs were initialisation errors that simply don’t exist in D. Or Java. Or Go. Sigh.
- I hate headers with a burning passion. Modules should be the top C++17 priority IMHO since there’s zero chance of them making into C++14.
- I missed slices. A lot. std::vector and std::deque are poor substitutes.
- Trying to port code written in a garbage collected language and trying to simply introduce std::unique_ptr and std::shared_ptr where appropriate was a massive PITA. I’m not even sure I got it right, more on that below.
The C++ implementation is incomplete and will continue to be like that, since I’m now bored of it, tired, and just want to move on. It’s also buggy. All of the loadtest benchmarks were done with only 1000 messages instead of the values at the top since it crashes if left to run for long enough. I’m not going to debug it because it’s not going to be any fun and nobody is paying me to do it.
It’s not optimised either. I never even bothered to run a profiler. I was going to do it as soon as I fixed all the bugs but I gave up long before that. I know it’s doing excessive copying because copying vectors of bytes around was the easiest way I could get it to compile after copying the D code using slices. It was on my TODO list to remove and replace with iterators, but, as I mentioned, it’s not going to happen.
I reckon a complete version would probably do as well as Java at pingtest but have a hunch that D would probably still win loadtest. This is, of course, pure speculation. So why did I bother to include the C++ results? I thought it would still be interesting and give a rough idea of how it would compare. I wish I had the energy to finish it, but I just wasn’t having fun anymore and I don’t see the point. Writing it from scratch in C++ would have been a better idea, but it definitely would have taken a longer amount of time. It would’ve looked similar to what I have now anyway (I’d still be the author), but I have the feeling it would have fewer bugs. Thinking about memory management from the start is very different from trying to apply smart pointers to an already existing design that depended on a garbage collector.
My conclusion from all of this is that I really don’t want to write C++ again unless I have to. And that for all the misgivings I had about a garbage collector, it saves me time that I would’ve used tracking down memory leaks, double frees and all of those other “fun” activities. And, at least for this exercise, it doesn’t even seem to make a dent in performance. Java was the pingtest winner after all, but its GC is a lot better than D’s. To add insult to C++’s injury, that Java implementation took Patrick a morning to write from scratch, and an afternoon to profile and optimise. It took me days to port an existing working implementation from the closest language there is to C++ and ended up with a crashing binary. It just wasn’t worth the time and effort, but at least now I know that.