Category Archives: Uncategorized

Computer languages: ordering my favourites

This isn’t even remotely supposed to be based on facts, evidence, benchmarks or anything like that. You could even disagree with what are “scripting languages” or not. All of the below just reflect my personal preferences. In any case, here’s my list of favourite computer languages, divided into two categories: scripting and, err… I guess “not scripting”.

 

My favourite scripting languages, in order:

  1. Python
  2. Ruby
  3. Emacs Lisp
  4. Lua
  5. Powershell
  6. Perl
  7. bash/zsh
  8. m4
  9. Microsoft batch files
  10. Tcl

 

I haven’t written enough Ruby yet to really know. I suspect I’d like it more than Python but at the moment I just don’t have enough experience with it to know its warts. Even I’m surprised there’s something below Perl here but Tcl really is that bad. If you’re wondering where PHP is, well I don’t know because I’ve never written any but from what I’ve seen and heard I’d expect it to be (in my opinion of course) better than Tcl and worse than Perl. I’m surprised how high Perl is given my extreme dislike for it. When I started thinking about it I realised there’s far far worse.

 

My favourite non-scripting languages, in order:

  1. D
  2. C++
  3. Haskell
  4. Common Lisp
  5. Rust
  6. Java
  7. Go
  8. Objective C
  9. C
  10. Pascal
  11. Fortran
  12. Basic / Visual Basic

I’ve never used Scheme, if that explains where Common Lisp is. I’m still learning Haskell so not too sure there. As for Rust, I’ve never written a line of code in it and yet I think I can confidently place it in the list, especially with respect to Go. It might place higher than C++ but I don’t know yet.

 

Tagged , , , , , , , , , , , , ,

Summary of CppCon videos I’ve watched so far

There seemed to be a theme going at DConf 2014. Or a few. Lots of “metaprogramming is great but hard” (including my talk) and also lots of “the GC got in the way somehow”.

After watching a few of the CppCon videos from this year, it seems to me that it also has a few recurrent themes. Quite a few talks mentioned using custom allocators but completely doing away with exceptions and/or templates. It seems these practices are more widespread than I thought. It still troubles me that nobody seems interested in measuring what the effects would be. I hear a lot of “exceptions would slow this down”, but no measurements. “Template bloat” shows up a lot but I don’t understand how the code woudn’t be just as large if doing them by hand, bar pathological instances of generating code for int/uint and the like. Besides, isn’t that premature optimisation anyway? Write it neat, then cut down on memory usage if using too much.

Really interesting talks, I recommend everyone to check it out.

Neves’s Laws of Testing

I got into a heated argument over testing practices at work the other day. That led me to think about the subject quite a bit, and have come up with a few laws of testing based on my experience:

  1. Only mock what’s needed to make the test deterministic and fast; otherwise call the real code
  2. A flaky test is worse than no test at all
  3. Good tests verify behaviour; bad tests verify an implementation

The worst thing about law number 3 is that it’s just stating the commonly known knowledge that coupling is bad and should de reduced, but for some reason some people don’t realise they’re increasing it when they write tests for the implementation.

Law number 1 needs some explaining. Mostly I’m only interested in mocking/stubbing/etc. functions that make a unit test no be “pure”. Usual suspects are file system operations, networking, databases, etc. I’ve seen a few examples that don’t fit the “traditional” examples but make good canditates. One that particularly opened my eyes was using a mock log object to state that error messages were logged when calling a function with certain arguments. Another situation is that in some embedded contexts it might be hard to even get a file to compile outside of a specialised environment. Mocking can make sense there too. Basically, whatever makes testing difficult should be mocked. Or to paraphrase Einstein, “All code should be mocked as little as possible, but not less”.

What are your testing laws? How do you disagree with mine?

 

Tagged

Emacs as my IDE: homecoming

Once upon time I wrote code using Turbo Pascal and Borland C. Then I discovered protected mode and heard of a compiler called DJGPP that could emit 32-bit code that could access more than 640k of memory. I had to have it. Unfortunately I was used to my text-based IDEs and had no idea how to use a compiler from the command-line. RHIDE and its near perfect imitation of the Borland interface won me over.

At some point that I don’t quite remember (probably when I went to CERN) I started using Linux and Emacs. Emacs was weird. None of the keystroke conventions of other programs worked. It did its own thing. It took a while but I got used to it and even modified my .emacs file, although never so much as to be in a foreign place on a new installation. I’d only really copy elisp from the internet. I ended up using Emacs to write pretty much any code, except for Java. I wrote that in Eclipse.

When I joined Cisco, everyone in my team was using Eclipse in a Linux VM running on Window to write C and C++. I joined the team and started resenting Eclipse more and more each day. At one point I managed to configure a Windows XP VM running Visual Studio that used up less RAM than Eclipse on Linux! But I had to give it to the IDE: it made many things easier when working on a shared codebase. I longed for Emacs but I didn’t want to give up the IDE features that I now relied on, in order of importance:

  1. Jump to definition.
  2. Autocompletion.
  3. On-the-fly syntax checking
  4. Finding a file in the project
  5. Macro expansion
  6. Rename refactoring

If only I could find a way of having this in Emacs… I knew there must be a way, but I also knew it wasn’t going to be easy. So I procrastinated for years. Until last year.

Last year I went into a mad flurry of optimising my productivity. I tried different Linux distros, desktop environments, the works. I changed a lot of the way that I work. My Linux distro went from Kubuntu to Arch Linux. My shell, from bash to zsh. My favourite language was C++, now it’s D. And my “IDE” is now Emacs.

I use auto-complete-clang and flycheck for items 2 and 3 above, and until a few days ago, cscope for item 1. But there were problems since the compiler flags used to build the project must be known for all of this to actually work. But my build system knows the compiler flags, so how do I get it to pass those flags on?

Well, this is an Emacs post so the answer is of course, to write a new elisp package which I called cmake-ide, because that’s what I always use for C and C++ development. But cscope was still annoying me with its fake matches. What if I could use clang for not only autocompletion and on-the-fly highlighting but also for finding definitions? Turns out I can with rtags! I integrated this into cmake-ide and now life is good.

I’m never leaving Emacs now. I have everything I had with Eclipse, and the features I missed even work better than they did there. I version control my .emacs.d directory. I write unit tests in elisp to make sure that my packages don’t break. And I’m more productive for it. Life is good.

Prefer logging to comments in tests

I see this a lot:

testMyAwesomeCode() {
    //create object
    obj = factory();

    //set things up
    obj.setup();

    //ask for authorisation
    obj.getAuth();
    //....
}

There’s nothing inherently bad with this, it’s just that it could be a lot more useful if instead if were this:

testMyAwesomeCode() {
    log.debug("create object");
    obj = factory();

    log.debug("set things up");
    obj.setup();

    log.debug("ask for authorisation");
    obj.getAuth();
    //....
 }

The logs serve the same purpose as the comments in the original code, and now when one of the tests starts failing the logs are there to help see what’s going on and why.

I find this works better with a logging and testing system that is usually silent (i.e. the above logs aren’t printed) by default and can then be enabled with a run-time option. That way the noise isn’t there when it’s not needed.

To learn BDD with Cucumber, you must first learn BDD with Cucumber.

So I read about Cucumber a while back and was intrigued, but never had time to properly play with it. While writing my MQTT broker, however, I kept getting annoyed at breaking functionality that wasn’t caught by unit tests. The reason being that the internals were fine, the problems I was creating had to do with the actual business of sending packets. But I was busy so I just dealt with it.

A few weeks ago I read a book about BDD with Cucumber and RSpec but for me it was a bit confusing. The reason being that since the step definitions, unit tests and implementation were all written in Ruby, it was hard for me to distinguish which part was what in the whole BDD/TDD concentric cycles. Even then, I went back to that MQTT project and wrote two Cucumber features (it needs a lot more but since it works I stopped there). These were easy enough to get going: essentially the step definitions run the broker in another process, connect to it over TCP and send packets to it, evaluating if the response was the expected one or not. Pretty cool stuff, and it works! It’s what I should have been doing all along.

So then I started thinking about learning BDD (after all, I wrote the features for MQTT afterwards) by using it on a D project. So I investigated how I could call D code from my step definitions. After spending the better part of an afternoon playing with Thrift and binding Ruby to D, I decided that the best way to go about this was to implement the Cucumber wire protocol. That way a server would listen to JSON requests from Cucumber, call D functions and everything would work. Brilliant.

I was in for a surprise though, me who’s used to implementing protocols after reading an RFC or two. Instead of a usual protocol definition all I had to go on was… Cucumber features! How meta. So I’d use Cucumber to know how to implement my Cucumber server. A word to anyone wanting to do this in another language: there’s hardly any documentation on how to implement the wire protocol. Whenever I got lost and/or confused I just looked at the C++ implementation for guidance. It was there that I found a git submodule with all of Cucumber’s features. Basically, you need to implement all of the “core” features first (therefore ensuring that step definitions actually work), and only then do you get to implement the protocol server itself.

So I wanted to be able to write Cucumber step definitions in D so I could learn and apply BDD to my next project. As it turned out, I learned BDD implementing the wire protocol itself. It took a while to get the hang of transitioning from writing a step definition to unit testing but I think I’m there now. There might be a lot more Cucumber in my future. I might also implement the entirety of Cucumber’s features in D as well, I’m not sure yet.

My implementation is here.

Tagged , , , , , , , , , ,

Knowing when to abandon a project

I like to finish what I start. It feels like a failure to me to not see a project to completion, which is mostly the reason why I persevered with my PhD thesis despite not particularly liking what I was doing every day, having no money and living with my parents at age 30. So it’s with a heavy heart that I decided to stop working on my videogame.

This thing goes back a while. Once upon a time in 1995 I think, I went to a friend’s house to play some games on his Amiga. One of them was Gravity Power, which was so cool that I wanted a version of it for the PC so I could play it at home. Since there wasn’t one, I decided I’d write it. And so I did.

The version I wrote back then was in C and x86 assembly, 320×200 graphics on one screen and crude but playable. It wasn’t particularly good though. At some point I decided I’d write a better version, from scratch, with what were then high definition graphics (640×480). I did but soon enough my lack of experience with a larger software project showed and it was hard to keep adding modifications to the gameplay. So I rewrote it in C++.

This was my 2nd project in C++, and it showed as well. I can’t remember much about this version, I’d have to go look it up, but at some point I decided to rewrite the whole thing from scratch again. I’d toyed with enabling networking in the game, and once I realised how much work that would entail, I made an executive decision: no networking. That way I’d implement everything else I’d wanted in the game but at least at last finish it.

And I did, the result of which can be found at sourceforge. It turned out alright, I think. But I’d come back to it yet.

A few years later the C++11 standard came out and I wanted to know it. My background is mainly C++ so I thought it would be inexcusable for me to not know the new standard of the language I know the best. The only way to properly learn programming is to write code, so in that in mind my plan was to write the networking code for Gravastar in C++11 and bolt it on the existing codebase. It was a lot harder than I had anticipated. I’d painted myself into a non-networking corner and decoupling everything was taking an enormous amount of time. C++ itself wasn’t helping, and the new standard had given me oh so much more rope to hang myself. I kept putting off working on it since it was so hard, getting distracted by other projects. Then I learned D, and my desire to write any more C++ dwindled to nothing.

I’d written enough of C++11 to get a good feel for it though, and I just didn’t want to work on networked Gravastar anymore. It just wasn’t fun. And since I was doing this in my leisure time, what was the point? So… I quit. I learned C++11, I learned that I still make a lot of design mistakes in code that’s not that old, and I learned that this time it’s just better to let it go. And once I’d done that, it felt like a weight off my shoulders.

I still want to write a game with a networking component. I just don’t want to do it in C++, or to adapt a codebase that had been designed in such a way to never accomodate networking to do it. It might just be time for a new project.

Increasing performance with static polymorphism (and other neat tricks)

So I wrote a serialisation library in D. I initially wrote it to try and understand and see the benefits of the compile-time reflection available in the language. I based it off of the library I wrote in C++11, and as such my brain already had the design in its head. This design was in turn inspired by similar code a colleague had written at work. So when I wrote the code, I used dynamic polymorphism, which in D means classes.

That usage of the new operator bugged me though. In real code (such as here) the objects doing the serialisation were always actually short-lived. Enter function, do the job, exit. It seemed like a waste to allocate memory on the garbage-collected heap and I started thinking about transforming them into structs, which can live on the stack. Then it dawned on me: dynamic polymorphism wasn’t ever actually needed. It’s never the case that the code doesn’t know whether it wants to marshall or unmarshall a value. That decision is always made at compile-time so the the cost of the virtual functions and garbage collection was being paid for nothing. The other place that was using GC allocation was the serialiser itself, which was appending to a dynamic array. It was the simplest thing that would work, so that’s what I started out with. Of course, the same realisation could’ve happened whilst maintaing the C++ version and there also it would’ve been possible to convert, but it’s so much pain do it in C++ and so easy in D that it just happens naturally.

I converted the codebase to use structs and template functions instead, breaking backwards compatibility with the old (V0.4.x) version of Cerealed. In the process, I ended up discovering the Appender struct in std.array. This happened as a result of trying to do policy-based design so I could separate the algorithm (in this case, how to marshall) from the process of actually writing to an OutputRange (see std.range). I had also recently read about warp and the new ScopeBuffer in Phobos (the D standard library) and wanted to see how these new additions would affect performance. I wrote a small, not particularly well-written test program, which can be seen in this gist. I used both gdc and dmd. I left ldc out because its frontend (at least the package currently available on Arch Linux) is older and can’t compile the code, and I didn’t feel like making alterations just to see how well it would do.

The results are presented in the tables below. I left out standard deviations because they were too small for nearly every measurement, so the values are just averages of a few different runs. “Classes” is the original V0.4.1 Cerealed OOP code, “Structs” is the code with structs instead of classes but still using a dynamic array, “Appender” and “ScopeBuffer” use the Phobos structs mentioned above. Since ScopeBuffer isn’t part of my distribution of Phobos, I copied it to the project instead. That way it can be compiled by other people who, like me, don’t compile their own versions of the compiler and standard library. I did 25M loops for serialisation and 75M loops for deserialisation. I compiled using (g)dmd with options -O -release -inline -noboundscheck.

Cerealiser (seconds, lower is better) dmd gdc
Classes 22.8 16.1
Structs (dynamic array) 19.9 14.0
Appender 16.1 9.5
ScopeBuffer 4.6 4.5

 

Decerealiser (seconds, lower is better) dmd gdc
Classes 21.7 17.3
Structs 8.9 9.9

 

For unmarshalling, I only compared classes vs. structs. The reason is that unmarshalling didn’t allocate memory (it uses whatever slice is passed to it so allocation is the responsibility of the client code), so there wasn’t much I could do to improve performance. Even then, that slight alteration causes a dramatic reduction in the time spent deserialising, with dmd making it more than twice as fast. Inlining is great!

For marshalling, the results between the slowest and fastest version are nearly a factor of 4-5! ScopeBuffer made such a difference, despite me using, on purpose, a static array that was too small to hold the struct so it had to allocate. I tried with a larger array and there was no difference in performance.  Most of the  results shows gdc generating more efficient code for most cases. The real lesson here is that using the right algorithm for the job (in this case, ScopeBuffer), makes a much larger difference than everything else.

I’m really happy with the results. None of the new V0.5.0 and newer Cerealed code needs to use the garbage collector anymore and I even added a convenience function called cerealise to use ScopeBuffer. It works by passing in a lambda to act on the resulting byte array and is templated on the size of the static array, which has a default of 32 bytes.

I could go back and do something similar for the C++ version but… that would be a lot more work (moving everything into the headers alone would take quite some time), I don’t really have any projects that require super fast serialisation and these days I just want to hack on D anyway. I still want to finish the networking part of my game (in progress and won’t compile on Windows, but there are binaries for the old version on sourceforge), which means some more C++, but after that I’ll avoid it when I can. Unless the alternative is C, of course. I really dislike C.

All in all, it’s unlikely the bottleneck of any app using Cerealed will be the serialisation, but if it is… it just got a whole lot faster.

Tagged , , ,

Adding Java and C++ to the MQTT benchmarks or: How I Learned to Stop Worrying and Love the Garbage Collector

This is a followup to my first post, where I compared different MQTT broker implementations written in D, C, Erlang and Go. Then my colleague who wrote the Erlang version decided to write a Java version too, and I felt compelled to do a C+11 implementation. This was only supposed to simply add the results of those two to the benchmarks but unfortunately had problems with the C++ version, which led to the title of this blog post. More on that later. Suffice it to say for now that the C++ results should be taken with a large lump of salt. Results:

loadtest (throughput - bigger is better)
Connections:         500            750            1k
D + vibe.d:          166.9 +/- 1.5  171.1 +/- 3.3  167.9 +/- 1.3
C (Mosquitto):       122.4 +/- 0.4   95.2 +/- 1.3   74.7 +/- 0.4
Erlang:              124.2 +/- 5.9  117.6 +/- 4.6  117.7 +/- 3.2
Go:                  100.1 +/- 0.1   99.3 +/- 0.2   98.8 +/- 0.3
Java:                105.1 +/- 0.5  105.8 +/- 0.3  105.8 +/- 0.5
C++11 + boost::asio: 109.6 +/- 2.0  107.8 +/- 1.1  108.5 +/- 2.6

pingtest (throughput constrained by latency - bigger is better)
parameters:          400p 20w       200p 200w      100p 400w
D + vibe.d:          50.9 +/- 0.3   38.3 +/- 0.2   20.1 +/- 0.1
C (Mosquitto):       65.4 +/- 4.4   45.2 +/- 0.2   20.0 +/- 0.0
Erlang:              49.1 +/- 0.8   30.9 +/- 0.3   15.6 +/- 0.1
Go:                  45.2 +/- 0.2   27.5 +/- 0.1   16.0 +/- 0.1
Java:                63.9 +/- 0.8   45.7 +/- 0.9   23.9 +/- 0.5
C++11 + boost::asio: 50.8 +/- 0.9   44.2 +/- 0.2   21.5 +/- 0.4

In loadtest the C++ and Java implementations turned out to be in the middle of the pack with comparable performance between the two. Both of them are slightly worse than Erlang and D is still a good distance ahead. In pingtest it gets more interesting: Java mostly matches the previous winner (the C version) and beats it in the last benchmark, so it’s now the clear winner. The C++ version matches both of those in the middle benchmark, does well in the last one but only performs as well as the D version in the first one. A win for Java.

Now about my C++ woes: I brought it on myself a little bit, but the way I approached it was by trying to minimise the amount of work I had to do. After all, writing C++ takes a long while at the best of times so I went and ported it from my D version by translating it by hand. I gleaned a few insights from doing so:

  • Using C++11 made my life a lot easier since it closes the gap with D considerably.  const and immutable became const auto, auto remained the same except when used as a return value, etc.
  • Having also written both C++ and D versions of the serialisation libraries I used as well as the unit-testing ones made things a lot easier, since I used the same concepts and names.
  • I’m glad I took the time to port the unit tests as well. I ended up introducing several bugs in the manual translation.
  • A lot of those bugs were initialisation errors that simply don’t exist in D. Or Java. Or Go. Sigh.
  • I hate headers with a burning passion. Modules should be the top C++17 priority IMHO since there’s zero chance of them making into C++14.
  • I missed slices. A lot. std::vector and std::deque are poor substitutes.
  • Trying to port code written in a garbage collected language and trying to simply introduce std::unique_ptr and std::shared_ptr where appropriate was a massive PITA. I’m not even sure I got it right, more on that below.

The C++ implementation is incomplete and will continue to be like that, since I’m now bored of it, tired, and just want to move on. It’s also buggy. All of the loadtest benchmarks were done with only 1000 messages instead of the values at the top since it crashes if left to run for long enough. I’m not going to debug it because it’s not going to be any fun and nobody is paying me to do it.

It’s not optimised either. I never even bothered to run a profiler. I was going to do it as soon as I fixed all the bugs but I gave up long before that. I know it’s doing excessive copying because copying vectors of bytes around was the easiest way I could get it to compile after copying the D code using slices. It was on my TODO list to remove and replace with iterators, but, as I mentioned, it’s not going to happen.

I reckon a complete version would probably do as well as Java at pingtest but have a hunch that D would probably still win loadtest. This is, of course, pure speculation. So why did I bother to include the C++ results? I thought it would still be interesting and give a rough idea of how it would compare. I wish I had the energy to finish it, but I just wasn’t having fun anymore and I don’t see the point. Writing it from scratch in C++ would have been a better idea, but it definitely would have taken a longer amount of time. It would’ve looked similar to what I have now anyway (I’d still be the author), but I have the feeling it would have fewer bugs. Thinking about memory management from the start is very different from trying to apply smart pointers to an already existing design that depended on a garbage collector.

My conclusion from all of this is that I really don’t want to write C++ again unless I have to. And that for all the misgivings I had about a garbage collector, it saves me time that I would’ve used tracking down memory leaks, double frees and all of those other “fun” activities. And, at least for this exercise, it doesn’t even seem to make a dent in performance. Java was the pingtest winner after all, but its GC is a lot better than D’s. To add insult to C++’s injury, that Java implementation took Patrick a morning to write from scratch, and an afternoon to profile and optimise. It took me days to port an existing working implementation from the closest language there is to C++ and ended up with a crashing binary. It just wasn’t worth the time and effort, but at least now I know that.

Tagged , , , , , , , , , , , , , , , , ,

Go vs D vs Erlang vs C in real life: MQTT broker implementation shootout.

At work we recently started using the MQTT protocol, which uses a publish / subscribe model. It’s simple in the good way and well thought out. We went with an open source implementation named Mosquitto. A few weeks ago on the way back from lunch break my colleague Jeff told me he was writing an MQTT broker in Go, his new favourite language. We’re using MQTT at work, I guess he was looking for a new project to write in Go and voilà. It should be a good fit, after all, this is the type of application that Go was made for. But hubris caught up to him when he uttered “And, of course, it’ll be super fast. It won’t even be fair to other languages”. I’m paraphrasing, but that’s how I remember it. You can read Jeff’s account here.

I’m not a fan of Go at all. I wasn’t particularly impressed when I first read about it but given how much I keep hearing about it on proggit and from Jeff himself, I gave it a go a few months back writing a genetic algorithm framework in it. I came out of that experience liking it even less. It’s just not for me. Go is an opinionated language, which would be fine if I agreed with its creators’ opinions. The way my brain works is that I’m on the opposite side of nearly all of them. It does have a few things I like. The absence of semicolons and parentheses, for instance. Goroutines and channels are a huge win. I can live without exceptions, even though I’d rather not, but generics? They can pry them away from my cold dead hands.

D, on the other hand… now we’re talking. Everything I like about C++ and more, with none of the warts. Of course, it has its own warts too, but nothing’s perfect. So, as a D fan and not so much of a Go one, I took Jeff’s statement as a gauntlet to the face. I learned of vibe.d watching the dconf2013 videos and really liked its idea of a synchronous API on top of asynchronous IO. I was convinced I could at least match a Go implementation’s performance, if not exceed it. So I wrote enough of an MQTT broker implementation to be able to run Jeff’s Go benchmark and compare performances. I reached a version that was faster than his after about 2 days. He came up with a second benchmark and my implementation performed poorly, so I went back to optimising. Around this time another colleague wanted in on the competition and used it as an excuse to learn Erlang, and wrote his own implementation. A few rounds of optimising later, and the results were in, which I’ve included below. Explanations on methodology follow.

 
loadtest (throughput - bigger is better)
Connections:   100            500            750            1k
D + vibe.d:    121.7 +/- 1.5  166.9 +/- 1.5  171.1 +/- 3.3  167.9 +/- 1.3
C (Mosquitto): 106.1 +/- 0.8  122.4 +/- 0.4   95.2 +/- 1.3   74.7 +/- 0.4
Erlang:        104.1 +/- 2.2  124.2 +/- 5.9  117.6 +/- 4.6  117.7 +/- 3.2
Go:             90.9 +/- 11   100.1 +/- 0.1   99.3 +/- 0.2   98.8 +/- 0.3

pingtest (latency - bigger is better)
parameters:    400p 20w       200p 200w      100p 400w
D + vibe.d:    50.9 +/- 0.3   38.3 +/- 0.2   20.1 +/- 0.1
C (Mosquitto): 65.4 +/- 4.4   45.2 +/- 0.2   20.0 +/- 0.0
Erlang:        49.1 +/- 0.8   30.9 +/- 0.3   15.6 +/- 0.1
Go:            45.2 +/- 0.2   27.5 +/- 0.1   16.0 +/- 0.1

All of the numbers are thousands of messages received by the client application per second. All measurements were done on my laptop, a Lenovo W530 running Arch Linux so all of the TCP connections were on localhost. Each number is the mean of several measurements, and I used the standard deviation as an estimate of the systematic error. All of the MQTT broker implementations run in one system thread. Using multiple threads resulted in no performance benefits for latency and worse performance for throughput.

Mosquitto was compiled with gcc 4.8.2, the Go implementation was executed with go run, the D implementation was compiled with dmd 2.0.64.2 and the Erlang version I’m not sure. I installed the Arch Linux erlang package and used my colleague’s Makefile without looking at it.

The two benchmarks are loadtest and pingtest. The former measures throughput whereas the latter measures latency. In loadtest a few hundred connections are set up to the broker. Half of these subscribe to a topic and the other half publishes to that topic as fast as possible. The benchmark ends when all of the subscribers have received a certain number of messages, determined by a command-line argument. I varied the number of connections to see how that would affect each broker. There was no contest here, the D implementation was by far the fastest. With a 100 connections I think there wasn’t enough work to do so that all implementations ended up waiting on IO. Except for Mosquitto, they all scaled rather nicely. I had problems measuring Jeff’s implementation due to a bug. He knows about the bug but just can’t be bothered fixing it. The numbers were taken from Go 1.1 (the pingtest numbers are Go 1.2). When his implementation works, Go 1.2 produces a binary that performs on the order of 10%-15% faster than the numbers above, which might mean equivalent performance to the Erlang implementation. I even think the bug shows up more often in Go 1.2 exactly because the resulting binary is more performant.

In pingtest Jeff tried to write a better benchmark and it measures latency. The two main command-line arguments are the number of connection pairs and the number of wildcard subscribers. For each pair, one of the connections subscribes to a request topic unique to that pair and the partner connection subscribes to a reply topic. One partner publishes a request and waits for the other connection to publish a reply. The number of messages sent per second now depends on the round-trip time between these two. Additionally, the wildcard subscribers receive both the request and reply messages from the first connection pair. The number before the ‘p’ is the number of connection pairs, and the number before the ‘w’ is the number of wildcard subscriber connections. Here Mosquitto is the fastest, but the performance difference diminishes with more wildcards, being on par with the D implementation in the last column. I’m not sure why it’s the fastest. I think there’s a possibility that vibe.d might be switching to the “wrong” fiber but that’s pure speculation on my part.

What about readability and ease of writing? I can’t read Erlang so I can’t comment on that. Despite my preference for D I think the D and Go implementations are equally readable. Since the Erlang unit tests are in the same files as the implementation, it’s hard to know exactly how many lines long it is. It gets worse since it implements most of MQTT, the D implementation essentially only implements what’s necessary to run the benchmarks. With those caveats (and the fact that dependencies aren’t counted) the 3 implementations clock in at somewhere between 800 and 1000 lines, without filtering out blank lines and comments.

Could they be optimised further? Probably. In the end the choice of algorithm and data structures matter more than the programming language so my personal advice is to choose the language that makes you productive. None of them magically made the implementations performant; we all had to profile, analyse, optimise, try ideas and measure. I loved writing it in D, but then again I’m a convert. I particularly enjoyed using the serialisation library I wrote for it, Cerealed. Much of the typical bit twiddling boilerplate in networking code disappeared, and that was only made possible by D’s compile-time reflection and user-defined attributes.

Source:

D: https://github.com/atilaneves/mqtt
C: https://bitbucket.org/oojah/mosquitto/
Go: https://github.com/jeffallen/mqtt
Erlang: https://bitbucket.org/pvalsecc/erlangmqtt
Tagged , , , , , , , , , , ,