Adding Java and C++ to the MQTT benchmarks or: How I Learned to Stop Worrying and Love the Garbage Collector

This is a followup to my first post, where I compared different MQTT broker implementations written in D, C, Erlang and Go. Then my colleague who wrote the Erlang version decided to write a Java version too, and I felt compelled to do a C+11 implementation. This was only supposed to simply add the results of those two to the benchmarks but unfortunately had problems with the C++ version, which led to the title of this blog post. More on that later. Suffice it to say for now that the C++ results should be taken with a large lump of salt. Results:

loadtest (throughput - bigger is better)
Connections:         500            750            1k
D + vibe.d:          166.9 +/- 1.5  171.1 +/- 3.3  167.9 +/- 1.3
C (Mosquitto):       122.4 +/- 0.4   95.2 +/- 1.3   74.7 +/- 0.4
Erlang:              124.2 +/- 5.9  117.6 +/- 4.6  117.7 +/- 3.2
Go:                  100.1 +/- 0.1   99.3 +/- 0.2   98.8 +/- 0.3
Java:                105.1 +/- 0.5  105.8 +/- 0.3  105.8 +/- 0.5
C++11 + boost::asio: 109.6 +/- 2.0  107.8 +/- 1.1  108.5 +/- 2.6

pingtest (throughput constrained by latency - bigger is better)
parameters:          400p 20w       200p 200w      100p 400w
D + vibe.d:          50.9 +/- 0.3   38.3 +/- 0.2   20.1 +/- 0.1
C (Mosquitto):       65.4 +/- 4.4   45.2 +/- 0.2   20.0 +/- 0.0
Erlang:              49.1 +/- 0.8   30.9 +/- 0.3   15.6 +/- 0.1
Go:                  45.2 +/- 0.2   27.5 +/- 0.1   16.0 +/- 0.1
Java:                63.9 +/- 0.8   45.7 +/- 0.9   23.9 +/- 0.5
C++11 + boost::asio: 50.8 +/- 0.9   44.2 +/- 0.2   21.5 +/- 0.4

In loadtest the C++ and Java implementations turned out to be in the middle of the pack with comparable performance between the two. Both of them are slightly worse than Erlang and D is still a good distance ahead. In pingtest it gets more interesting: Java mostly matches the previous winner (the C version) and beats it in the last benchmark, so it’s now the clear winner. The C++ version matches both of those in the middle benchmark, does well in the last one but only performs as well as the D version in the first one. A win for Java.

Now about my C++ woes: I brought it on myself a little bit, but the way I approached it was by trying to minimise the amount of work I had to do. After all, writing C++ takes a long while at the best of times so I went and ported it from my D version by translating it by hand. I gleaned a few insights from doing so:

  • Using C++11 made my life a lot easier since it closes the gap with D considerably.  const and immutable became const auto, auto remained the same except when used as a return value, etc.
  • Having also written both C++ and D versions of the serialisation libraries I used as well as the unit-testing ones made things a lot easier, since I used the same concepts and names.
  • I’m glad I took the time to port the unit tests as well. I ended up introducing several bugs in the manual translation.
  • A lot of those bugs were initialisation errors that simply don’t exist in D. Or Java. Or Go. Sigh.
  • I hate headers with a burning passion. Modules should be the top C++17 priority IMHO since there’s zero chance of them making into C++14.
  • I missed slices. A lot. std::vector and std::deque are poor substitutes.
  • Trying to port code written in a garbage collected language and trying to simply introduce std::unique_ptr and std::shared_ptr where appropriate was a massive PITA. I’m not even sure I got it right, more on that below.

The C++ implementation is incomplete and will continue to be like that, since I’m now bored of it, tired, and just want to move on. It’s also buggy. All of the loadtest benchmarks were done with only 1000 messages instead of the values at the top since it crashes if left to run for long enough. I’m not going to debug it because it’s not going to be any fun and nobody is paying me to do it.

It’s not optimised either. I never even bothered to run a profiler. I was going to do it as soon as I fixed all the bugs but I gave up long before that. I know it’s doing excessive copying because copying vectors of bytes around was the easiest way I could get it to compile after copying the D code using slices. It was on my TODO list to remove and replace with iterators, but, as I mentioned, it’s not going to happen.

I reckon a complete version would probably do as well as Java at pingtest but have a hunch that D would probably still win loadtest. This is, of course, pure speculation. So why did I bother to include the C++ results? I thought it would still be interesting and give a rough idea of how it would compare. I wish I had the energy to finish it, but I just wasn’t having fun anymore and I don’t see the point. Writing it from scratch in C++ would have been a better idea, but it definitely would have taken a longer amount of time. It would’ve looked similar to what I have now anyway (I’d still be the author), but I have the feeling it would have fewer bugs. Thinking about memory management from the start is very different from trying to apply smart pointers to an already existing design that depended on a garbage collector.

My conclusion from all of this is that I really don’t want to write C++ again unless I have to. And that for all the misgivings I had about a garbage collector, it saves me time that I would’ve used tracking down memory leaks, double frees and all of those other “fun” activities. And, at least for this exercise, it doesn’t even seem to make a dent in performance. Java was the pingtest winner after all, but its GC is a lot better than D’s. To add insult to C++’s injury, that Java implementation took Patrick a morning to write from scratch, and an afternoon to profile and optimise. It took me days to port an existing working implementation from the closest language there is to C++ and ended up with a crashing binary. It just wasn’t worth the time and effort, but at least now I know that.

Advertisement
Tagged , , , , , , , , , , , , , , , , ,

8 thoughts on “Adding Java and C++ to the MQTT benchmarks or: How I Learned to Stop Worrying and Love the Garbage Collector

  1. C++ result is good considering that you you don’t have a good enough knowledge of the language to permit to write a good performance implementation (or even a working one)

    If would be interesting if someone that can write C++, could write a decent implementation and check it against the D one

    D has one serious advance respect of C++: its metaprogramming capabilities, and would be interesting to see if this is an use case where this advance can matter

    • atilaneves says:

      I never said I didn’t know C++ well. I do. The reason I tried porting it over from D was, as mentioned in the blog, to try and save time since writing C++ takes forever. Headers are a big part of the reason why. Besides, writing it in C++ wouldn’t so much check how performant C++ is as much as pitting boost::asio against whatever it is that Mosquitto uses.

      There are cases in which, due to templates and inlining, C++ code can be faster than C. It wouldn’t be the case here since I use dynamic polymorphism. I wouldn’t expect the C++ version to be faster than C, just for the code to be shorter and more readable.

      • Sorry, I must have been too aggressive
        The first thing I noticed was the lack of move where is should has been, range for copying shared_ptr and some others naive errors

        I continue to fail to view the point of this message, since a benchmark is usually made for evaluating performance characteristic — and don’t make sense to evaluate performance characteristic for bad code
        Usually people first write correct code, than do some profiling, than do some benchmark — as you did in your other messages

        Changing topic, for sure D is interesting. I continue to hesitate to try seriously it for bad runtime (so heard), some features that I consider bad (like @property), and because I think that D make hard to manually manage heap allocated object respect to C++ (I may be wrong here!)
        I find D interesting for two major selling point:
        – Very good metaprogramming capabilities (but not so extended like CamlP4 or Racket — and probably because their goals are different)
        – Design by contract
        – Ability to rely also on a GC (viewed as a fast RC and hoping that is not abused by typical D developer as in Java)

        (and sorry for the various spelling errors in my messages)

  2. atilaneves says:

    It’s possible I forgot a std::move here or there. I never did finish it, after all. The profiling and optimising was to come later, but I got bored and quit.

    The point of the post wasn’t to bash C++ or to prove it loses to any other language in performance. It was that I set out to see exactly how a C++ version would do compared to the other 5 and gave up because, in this case, it was more hassle than it was worth and what was once a fun little hobby project became “real work” except that I don’t get paid for it. I included the results, preliminary as they were, because they set the minimum performance bar.

    @property is discussed all the time and controversial, but you don’t have to use it. The runtime isn’t bad, it’s just that some compiler implementations still have many bugs and the GC isn’t as performant as Java’s.

  3. Dejan says:

    Can you test this java implementation https://code.google.com/p/moquette-mqtt/ and post the results.

  4. […] I wrote an MQTT broker once (twice really, but I never really finished the second version). It’s now my go-to way of learning a new […]

  5. Jon Harrop says:

    Very interesting, thanks.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: