At work we recently started using the MQTT protocol, which uses a publish / subscribe model. It’s simple in the good way and well thought out. We went with an open source implementation named Mosquitto. A few weeks ago on the way back from lunch break my colleague Jeff told me he was writing an MQTT broker in Go, his new favourite language. We’re using MQTT at work, I guess he was looking for a new project to write in Go and voilà. It should be a good fit, after all, this is the type of application that Go was made for. But hubris caught up to him when he uttered “And, of course, it’ll be super fast. It won’t even be fair to other languages”. I’m paraphrasing, but that’s how I remember it. You can read Jeff’s account here.
I’m not a fan of Go at all. I wasn’t particularly impressed when I first read about it but given how much I keep hearing about it on proggit and from Jeff himself, I gave it a go a few months back writing a genetic algorithm framework in it. I came out of that experience liking it even less. It’s just not for me. Go is an opinionated language, which would be fine if I agreed with its creators’ opinions. The way my brain works is that I’m on the opposite side of nearly all of them. It does have a few things I like. The absence of semicolons and parentheses, for instance. Goroutines and channels are a huge win. I can live without exceptions, even though I’d rather not, but generics? They can pry them away from my cold dead hands.
D, on the other hand… now we’re talking. Everything I like about C++ and more, with none of the warts. Of course, it has its own warts too, but nothing’s perfect. So, as a D fan and not so much of a Go one, I took Jeff’s statement as a gauntlet to the face. I learned of vibe.d watching the dconf2013 videos and really liked its idea of a synchronous API on top of asynchronous IO. I was convinced I could at least match a Go implementation’s performance, if not exceed it. So I wrote enough of an MQTT broker implementation to be able to run Jeff’s Go benchmark and compare performances. I reached a version that was faster than his after about 2 days. He came up with a second benchmark and my implementation performed poorly, so I went back to optimising. Around this time another colleague wanted in on the competition and used it as an excuse to learn Erlang, and wrote his own implementation. A few rounds of optimising later, and the results were in, which I’ve included below. Explanations on methodology follow.
loadtest (throughput - bigger is better)
Connections: 100 500 750 1k
D + vibe.d: 121.7 +/- 1.5 166.9 +/- 1.5 171.1 +/- 3.3 167.9 +/- 1.3
C (Mosquitto): 106.1 +/- 0.8 122.4 +/- 0.4 95.2 +/- 1.3 74.7 +/- 0.4
Erlang: 104.1 +/- 2.2 124.2 +/- 5.9 117.6 +/- 4.6 117.7 +/- 3.2
Go: 90.9 +/- 11 100.1 +/- 0.1 99.3 +/- 0.2 98.8 +/- 0.3
pingtest (latency - bigger is better)
parameters: 400p 20w 200p 200w 100p 400w
D + vibe.d: 50.9 +/- 0.3 38.3 +/- 0.2 20.1 +/- 0.1
C (Mosquitto): 65.4 +/- 4.4 45.2 +/- 0.2 20.0 +/- 0.0
Erlang: 49.1 +/- 0.8 30.9 +/- 0.3 15.6 +/- 0.1
Go: 45.2 +/- 0.2 27.5 +/- 0.1 16.0 +/- 0.1
All of the numbers are thousands of messages received by the client application per second. All measurements were done on my laptop, a Lenovo W530 running Arch Linux so all of the TCP connections were on localhost. Each number is the mean of several measurements, and I used the standard deviation as an estimate of the systematic error. All of the MQTT broker implementations run in one system thread. Using multiple threads resulted in no performance benefits for latency and worse performance for throughput.
Mosquitto was compiled with gcc 4.8.2, the Go implementation was executed with go run, the D implementation was compiled with dmd 220.127.116.11 and the Erlang version I’m not sure. I installed the Arch Linux erlang package and used my colleague’s Makefile without looking at it.
The two benchmarks are loadtest and pingtest. The former measures throughput whereas the latter measures latency. In loadtest a few hundred connections are set up to the broker. Half of these subscribe to a topic and the other half publishes to that topic as fast as possible. The benchmark ends when all of the subscribers have received a certain number of messages, determined by a command-line argument. I varied the number of connections to see how that would affect each broker. There was no contest here, the D implementation was by far the fastest. With a 100 connections I think there wasn’t enough work to do so that all implementations ended up waiting on IO. Except for Mosquitto, they all scaled rather nicely. I had problems measuring Jeff’s implementation due to a bug. He knows about the bug but just can’t be bothered fixing it. The numbers were taken from Go 1.1 (the pingtest numbers are Go 1.2). When his implementation works, Go 1.2 produces a binary that performs on the order of 10%-15% faster than the numbers above, which might mean equivalent performance to the Erlang implementation. I even think the bug shows up more often in Go 1.2 exactly because the resulting binary is more performant.
In pingtest Jeff tried to write a better benchmark and it measures latency. The two main command-line arguments are the number of connection pairs and the number of wildcard subscribers. For each pair, one of the connections subscribes to a request topic unique to that pair and the partner connection subscribes to a reply topic. One partner publishes a request and waits for the other connection to publish a reply. The number of messages sent per second now depends on the round-trip time between these two. Additionally, the wildcard subscribers receive both the request and reply messages from the first connection pair. The number before the ‘p’ is the number of connection pairs, and the number before the ‘w’ is the number of wildcard subscriber connections. Here Mosquitto is the fastest, but the performance difference diminishes with more wildcards, being on par with the D implementation in the last column. I’m not sure why it’s the fastest. I think there’s a possibility that vibe.d might be switching to the “wrong” fiber but that’s pure speculation on my part.
What about readability and ease of writing? I can’t read Erlang so I can’t comment on that. Despite my preference for D I think the D and Go implementations are equally readable. Since the Erlang unit tests are in the same files as the implementation, it’s hard to know exactly how many lines long it is. It gets worse since it implements most of MQTT, the D implementation essentially only implements what’s necessary to run the benchmarks. With those caveats (and the fact that dependencies aren’t counted) the 3 implementations clock in at somewhere between 800 and 1000 lines, without filtering out blank lines and comments.
Could they be optimised further? Probably. In the end the choice of algorithm and data structures matter more than the programming language so my personal advice is to choose the language that makes you productive. None of them magically made the implementations performant; we all had to profile, analyse, optimise, try ideas and measure. I loved writing it in D, but then again I’m a convert. I particularly enjoyed using the serialisation library I wrote for it, Cerealed. Much of the typical bit twiddling boilerplate in networking code disappeared, and that was only made possible by D’s compile-time reflection and user-defined attributes.