Monthly Archives: April 2016

Some code you just can’t unit test

My definition of unit tests precludes them from communicating with the outside world via the file system or networking. If your definition is different, your mileage on this post may vary.

I’m a big unit test enthusiast who uses TDD for most code I write. I’m also a firm believer in the testing pyramid, so I consider unit tests to be more important than the slower, flakier, more expensive tests. However, I’ve recently come to the conclusion that obsessing over unit tests to the detriment of the ones higher up the pyramid can be harmful. I guess the clue was “obsessing”.

In two recent projects of mine, I’ve written unit tests that I now consider utterly useless. More than that, I think the codebases of those two projects would be better off without them and that I did myself a disservice by writing them in the first place.

What both of these projects share in common is that they generate code for other programs to consume. In one case, generating build systems in GNU Make or Ninja, and in the other converting from GNU Make to D (so, basically the other direction). This means writing to files, which as mentioned above is a unit test no-no as far as I’m concerned. The typical way to get around this is to write a pure function that returns a string and a very small wrapper function that calls the pure one to write to a file. Now the pure function can be called from unit tests that check the return value. Yay unit tests? Nope.

Add another section to the output? Your tests are broken. Comments? Your tests are broken. Extra newlines? Your tests are broken. In none of these scenarios is the code buggy, and yet, in all of them N tests have to be modified even though the behaviour is the same. In one case I had passing unit tests checking for code generation when the output wouldn’t even compile!

If your program is generating C code, does it really matter what order the arguments to an equals expression are written in? Of course not. So what does matter? That the code compiles and has the intended semantics. It doesn’t matter what the functions and variables you generate are called. Only that if you compile it and you run it, that it does the right thing.

Code that generates output for another program to consume is inherently un-unit-testable. The only way to know it works is to call that program on your output.

In my GNU Make -> D case I was lucky: since I was using D to generate D code, I generated it at compile-time and  mixed it back in to test it so I had my unit test cake and ate it too. Compile times suffered, but I didn’t have to compile and link several little executables to test it. In most other languages, the only way forward would be to pay “the linker price”.

Essentially, my tests were bad because they tested implementation instead of behaviour, which is always a bad idea. I’ll write more about that in a future post.