Tag Archives: testing

My DConf2019 talk

I’ll be speaking at DConf this year. I haven’t yet decided how to structure the talk yet, but I’d like to take a page out of Dan Sak’s excellent CppCon 2016 talk that covered technical and psychological aspects of language adoption. In his case he was addressing a C++ audience on the difficulty of convincing C programmers to adopt C++ (an admirable goal!). In mine, it’s how do I out-C++ C++?

The genesis for my talk was my decision a few years ago to write tests for a legacy C codebase. I wanted to use D, but even as a fan of the language I chose to write the tests in C++ instead. And if even I wouldn’t have picked D for that task, who would?

I still think it was the right decision to make. The first step in”importing” the legacy project into a C++ test looks like this:

extern "C" {
    #include "legacy.h"
}

There is no second step. From here one #includes a test library such as catch2 and starts writing tests for the C functions in the API.

As far as I know, every language under the sun can interface with C. There’s usually (always?) some way to do FFI, and examples invariably show how to call a function that takes 2 integers, maybe even a const char* parameter. Unfortunately, that’s not what real codebases need to do. They need to call a function that takes a pointer to a structure that’s defined in a different header, which has 3 pointers to structures that are themselves defined in different headers, which…

You get the idea. Writing the definitions by hand error-prone, boring, and time-consuming. Even if correct, nobody seems to talk about the elephant in the C API room: macros. Not once have I encountered a non-trivial C API that doesn’t require the C preprocessor to use properly. C++ users are at an advantage here: just use the macros. Everyone else has to get by with ad-hoc solutions. And if the function-like macro implementations change, the code will break.

I think that the only way to make it so that C++ isn’t the obvious and/or only choice is that a challenger allows one to write `#include “legacy.h”` and have it work just as easily as it does natively. That’s why I started the dpp project, and I’m looking forward to talking about the technical challenges I’ve encountered on my journey so far.

See you at DConf!

Advertisement
Tagged , , , , , , ,

When you can’t (and shouldn’t) unit test

I’m a unit test aficionado, and, as such, have attempted to unit test what really shouldn’t be. It’s common to get excited by a new hammer and then seeing nails everywhere, and unit testing can get out of hand (cough! mocks! cough!).

I still believe that the best tests are free from side-effects, deterministic and fast. What’s important to me isn’t whether or not this fits someone’s definition of what a unit test is, but that these attributes enable the absence of slow and/or flaky tests. There is however another class of tests that are the bane of my existence: brittle tests. These are the ones that break when you change the production code despite your app/library still working as intended. Sometimes, insisting on unit tests means they break for no reason.

Let’s say we’re writing a new build system. Let’s also say that said build system works like CMake does and spits out build files for other build systems such as ninja or make. Our unit test fan comes along and writes a test like this:

assert make_output == "all: foo\nfoo: foo.c\n\tgcc -o foo foo.c"

I believe this to be a bad test, and the reason why is that it’s checking the implementation instead of the behaviour of the production code. Consider what happens when the implementation is changed without affecting behaviour:

all: foo\nfoo: foo.c\n\tgcc -o $@ $<

The behaviour is the same as before: any time `foo.c` is changed, `foo` will get recompiled. The implementation not only isn’t the same, it’s arguably better now, and yet the assertion in the test would fail. I think we can all agree that the ROI for this test is negative if this is all it takes to break it.

The orthodox unit test approach to situations like these is to mock the service in question, except most people don’t get the memo that you should only mock code you own. We don’t control GNU make, so we shouldn’t be doing that. It’s impossible to copy make exactly in a mock/stub/etc. and it’s foolish to even try. We (mostly) don’t care about the string that our code outputs, we care that make interprets that string with the correct semantics.

My conclusion is that I shouldn’t even try to write unit tests for code like this. Integration tests exist for a reason.

Tagged , , , , ,

Unit Testing? Do As I Say, Don’t Do As I Do

I’m a firm believer in unit testing. I’ve done more tech talks on the subject than I’d care to count, and always tell audiences the same thing: prefer unit tests, here’s a picture of the testing pyramid, keep unit tests pure (no side-effects), avoid end-to-end tests (they’re flaky, people will stop paying attention to red builds since all builds will be red). I tell them about adapters, ports and hexagonal architecture. But when it comes to using libclang to parse and translate C and C++ headers, I end up punting and writing a lot of integration tests instead. Hmm.

I know why people write tests with side-effects, and why they end up writing integration and end-to-end ones instead of the nice pure unit test happy place I advocate. It’s easier. There’s less thinking involved. A lot less. However, taking the easy path has always come back to bite me. Those kinds of tests take longer. They higher up the test pyramid you go, the flakier they get. TCP ports stay open longer than a tester would like, for instance. The network goes down. All sorts of things.

I understand why I wrote integration tests instead of unit tests when interfacing with libclang too. Like it is for everyone else, it was just easier. I failed to come up with a plan to unit test what I was doing. It didn’t help that I’d never used libclang and had no idea what the API looked like or what it allowed me to do. It also doesn’t help that libclang doesn’t have an option to take a string to the code to parse and instead takes a file name, but I can work around that.

Because of this, the dpp codebase currently suffers from that lack of separation of concerns. Code that translates C/C++ to D is now intimately tied to libclang and its quirks. If I ever try to use something other than libclang, I won’t be able to. All of the bad things I caution everybody else about? I guaranteed they happened in one of my newest projects.

Before the code collapses under its own complexity, I’ve decided to do what I should’ve done all along and am rewriting dpp so it uses layers to get away from the libclang mess. I’m still figuring it all out, but the main idea is to have a transformation layer between libclang and my code that takes its data types and converts them to a new set of AST types that are my own. From then on it should be trivial to unit test the translation of those AST types that represent C or C++ code into D. Funnily enough, the fact that I wrote so many integration tests will keep me honest since all of those old tests will still have to pass. I’m not sure how I feel about that.

I might do another blog post covering how I ended up porting a codebase with pretty much only integration tests to the unit variety. It might be of interest to anyone maintaining a legacy codebase (i.e. all of us).

Tagged , , , , , ,

Keep D unittests separated from production code

D has built-in unit tests, and unittest is even a keyword. This has been fantastically successful for the language, since there is no need to use an external framework to write tests, it comes with the compiler. Just as importantly, a unittest after a function can be used as documentation, with the test(s) showing up as “examples”. This is the opposite approach of running code in documentation as tests in Python – generate documentation from the tests instead.

As such, in D (similarly to Rust), it’s usual, idiomatic even, to have the tests written next to the code they’re testing. It’s easy to know where to see examples of the code in action: scroll down a bit and there are the unit tests.

I’m going to argue that this is an anti-pattern.

Let me start by saying that some tests should go along with the production code. Exactly the kind of “examply” tests that only exercise the happy path. Have them be executable documentation, but only have one of those per function and keep them short. The others? Hide them away as you would in C++. Here’s why I think that’s the case:

They increase build times.

If you edit a test, and that test lives next to production code, then every module that imports that module has to be rebuilt, because there’s currently no good way to figure out whether or not any of the API/ABI of that module has changed. Essentially, every D module is like a C++ header, and you go and recompile the world. D compiles a lot faster than C++, but when you’re doing TDD (in my case, pretty much always), every millisecond in build times count.

If the tests are in their own files, then editing a test means that usually only one file needs to be recompiled. Since test code is code, recompiling production code and its tests takes longer than just compiling production code alone.

I’m currently toying with the idea of trying to compile per package for production code but per module for test code – the test code shouldn’t have any dependencies other than the production code itself. I’ll have to time it to make sure it’s actually faster.

version(unittest) will cause you problems if you write libraries.

Let’s say that you’re writing a library. Let’s also say that to test that library you want to have a dependency on a testing library from http://code.dlang.org/, like unit-threaded. So you add this to your dub.sdl:

configuration "default" {
}
configuration "unittest" {
     dependency "unit-threaded" version="~>0.7.0"
}

Normal build? No dependency. Test build? Link to unit-threaded, but your clients never have the extra dependency. Great, right? So you want to use unit-threaded in your tests, which means an import:

module production_code;
version(unittest) import unit_threaded;

Now someone goes and adds your library as a dependency in their dub.sdl, but they’re not using unit-threaded because they don’t want to. And now they get a compiler error because when they compile their code with -unittest, the compiler will try and import a module/package that doesn’t exist.

So instead, the library has to do this in their dub.sdl;

configuration "unittest" {
    # ...
    versions "TestingMyLibrary"
}

And then:

version(TestingMyLibrary) import unit_threaded;

It might even be worse – your library might have code that should exist for version(unittest) but not version(TestingMyLibrary) – it’s happend to me. Even in the standard library, this happened.

Keep calm and keep your tests separated.

You’ll be happier that way. I am.

Tagged ,

DSLs: even more important for tests

Last week I wrote about the benefits of Domain Specific Languages (DSLs). Since then I’ve been thinking and realised that DSLs are even more important when writing tests. It just so happened that I was writing tests in Emacs Lisp for a package I wrote called cmake-ide, and given that Lisp has macros I was trying to leverage them for expressiveness.

Like most other programmers, I’ve been known from time to time to want to raze a codebase to the ground and rewrite it from scratch. The reason I don’t, of course, was aptly put by Joel Spolsky years ago. How could I ensure that nobody’s code would break? How can I know the functionality is the same?

The answer to that is usually “tests”, but if you rewrite from scratch, your old unit tests probably won’t even compile. I asked myself why not, why is it that the tests I wrote weren’t reusable. It dawned on me that the tests are coupled to the production code, which is never a good idea. Brittle tests are often worse than no tests at all (no, really). So how to make them malleable?

What one does is to take a page from Cucumber and write all tests using a DSL, avoiding at all costs specifying how anything is getting done and focussing on what. In Lisp-y syntax, avoid:

(write-to-file "foo.txt" "foobarbaz")
(open-file "foo.txt")
(run-program "theapp" "foo.txt" "out.txt")
(setq result parse-output "out.txt")
;; assertion here on result

Instead:

(with-run-on-file "theapp" "foo.txt" "foobarbaz" "out.txt" result
     ;; assertion here on result

 

Less code, easier to read, probably more reusable. There are certainly better examples; I suggest consulting Cucumber best practices on how to write tests.

Not every language will offer the same DSL liberties and so your mileage may vary. Fortunately for me, the two languages I’d been writing tests in were Emacs Lisp and D, and in both of those I can go wild.

Tagged

Commit failing tests if your framework allows it

In TDD, one is supposed to go through the 3-step cycle of:

  1. Write a failing test
  2. Make it pass
  3. Refactor

The common-sense approach is to not commit the failing test from the first step, since that would thrown a spanner in the works when you inevitably have to bisect your commit DAG trying to figure out where a bug was introduced.

I’ve come to a realisation recently – failing tests should be commited, but only if the testing framework being used allows you to mark failures as successes. For instance, in my D testing framework unit-threaded, I’d commit this silly example:

@ShouldFail("WIP")
unittest {
    1.shouldEqual(2);
}

If you’re not familiar with D, it has built-in unit tests, and unittest is a keyword. @ShouldFail is a User Defined Attribute, part of the library indicating that the unit test it applies to is expected to fail, and allows the user to specify an optional string describing why that’s the case. It could be a bug ID as well.

The test above passes if any of the code in the unittest block throws an exception, i.e. it passes if it fails. This way we can have a single commit of the failing test that motivated the code changes that follow it, and we can’t forget to remove @ShouldFail – in fact, if the programmer implements the feature / fixes the bug correctly, they should expect to see the test suite go red. If that doesn’t happen, either the production code or the test is buggy.

I’m not aware of many frameworks that allow a programmer to do this; pytest has something similar. If yours does, commit your failing tests.

Tagged , ,

Write custom assertions whenever possible

I’ve been very interested in readable tests with great error messages recently. Mostly because they kept failing and I wanted the most information possible in order to quickly identify the cause. This is another reason why I like TDD: you see the test failing first, so if the error message isn’t great you’ll know straight away instead of months later.

The good testing frameworks provide a way of writing your own custom assertions. I’d never really looked into them that much before, but now I realize the error of my ways. Recently I wrote a test that contained this line:

fileName.exists.shouldBeTrue;

Readable, right? The problem is when it fails:

foo.d:42 - Expected: true
foo.d:42 -      Got: false

And now you have to go read the test and figure out what went wrong. It’s a lot better to get the information that a file was supposed to exist instead right away. So I wrote a custom assertion and was then ready to write this:

fileName.shouldExist;

With the corresponding failure message:

foo.d:42 - Expected /tmp/foo.txt to exist but it didn't

Now it’s a lot easier to pinpoint where the problem is. For starters, you would probably want to start checking the contents of the surrounding directory, having saved the time you would have had to spend figuring out what exactly was false.

Tagged ,

unit-threaded: now an executable library

It’s one of those ideas that seem obvious in retrospect, but somehow only ocurred to me last week. Let me explain.

I wrote a unit testing library in D called unit-threaded. It uses D’s compile-time reflection capabilities so that no test registration is required. You write your tests, they get found automatically and everything is good and nice. Except… you have to list the files you want to reflect on, explicitly. D’s compiler can’t go reading the filesystem for you while it compiles, so a pre-build step of generating the file list was needed. I wrote a program to do it, but for several reasons it wasn’t ideal.

Now, as someone who actually wants people to use my library (and also to make it easier for myself), I had to find a way so that it would be easy to opt-in to unit-threaded. This is especially important since D has built-in unit tests, so the barrier for entry is low (which is a good thing!). While working on a far crazier idea to make it a no-brainer to use unit-threaded, I stumbled across my current solution: run the library as an executable binary.

The secret sauce that makes this work is dub, D’s package manager. It can download dependencies to compile and even run them with “dub run”. That way, a user need not even have to download it. The other dub feature that makes this feasible is that it supports “configurations” in which a package is built differently. And using those, I can have a regular library configuration and an alternative executable one. Since dub run can take a configuration as an argument, unit-threaded can now be run as a program with “dub run unit-threaded -c gen_ut_main”. And when it is, it generates the file that’s needed to make it all work.

So now all a user need to is add a declaration to their project’s dub.json file and “dub test” works as intended, using unit-threaded underneath, with named unit tests and all of them running in threads by default. Happy days.

Tagged , , , , ,

The importance of making the test fail

TDD isn’t for everyone. People work in different ways and their brains even more so, and I think I agree with Bertrand Meyer in that whether you write the test first or last, the important thing is that the test gets written. Even for those of us for whom TDD works, it’s not always applicable. It takes experience to know when or not to do it. For me, whenever I’m not sure of exactly I want to do and am doing exploratory work, I reach for a REPL when I can and don’t even think of writing tests. Even then, by the time I’ve figured out what to do I usually write tests straight afterwards. But that’s me.

However, when fixing bugs I think “TDD” (there’s not any design going on, really) should be almost mandatory. Almost, because I thought of a way that works that doesn’t need the test to be written first, but it’s more cumbersome. More on that later.

Tests are code. Code is buggy. Ergo… tests will contain bugs. So can we trust our tests? Yes, and especially so if we’re careful. First of all, tests are usually a lot smaller than the code they test (they should be!). Less code means fewer bugs on average. If that doesn’t give you a sense of security, it shouldn’t. The important thing is making sure that it’s very difficult to introduce simultaneous bugs in the test and production code that cancel each other out. Unless the tests are tightly coupled with the production code, that comes essentially for free.

Writing the test to reproduce a bug is important because we get to see it go from red to green, which is what gives us confidence. I’ve lost count of how many fake greens I’ve had due to tests that weren’t part of the suite, code that wasn’t even compiled, bugs in the test code, and many other reasons. Making it fail is important. Making changes in a different codebase (the production code) and then the test passing means we’ve actually done something. If at any point things don’t work as they should (red -> green) then we’ve made a mistake somewhere. The fix is wrong, the test is buggy, or our understanding of the problem and what causes it might be completely off.

Reproducing the bug accurately also means that we don’t start with the wrong foot. You might think you know what’s causing the bug, but what better way than to write a failing test? Now, it’s true that one can fix the bug first, write the test later and use the VCS to go back in time and do the red/green dance. But to me that sounds like a lot more work.

Whether you write tests before of after the production code, make sure that at least one test fails without the bugfix. Even if by just changing a number in the production code. I get very suspicious when the test suite is green for a while. Nobody writes correct code that often. I know I don’t.

Tagged , , ,

To learn BDD with Cucumber, you must first learn BDD with Cucumber.

So I read about Cucumber a while back and was intrigued, but never had time to properly play with it. While writing my MQTT broker, however, I kept getting annoyed at breaking functionality that wasn’t caught by unit tests. The reason being that the internals were fine, the problems I was creating had to do with the actual business of sending packets. But I was busy so I just dealt with it.

A few weeks ago I read a book about BDD with Cucumber and RSpec but for me it was a bit confusing. The reason being that since the step definitions, unit tests and implementation were all written in Ruby, it was hard for me to distinguish which part was what in the whole BDD/TDD concentric cycles. Even then, I went back to that MQTT project and wrote two Cucumber features (it needs a lot more but since it works I stopped there). These were easy enough to get going: essentially the step definitions run the broker in another process, connect to it over TCP and send packets to it, evaluating if the response was the expected one or not. Pretty cool stuff, and it works! It’s what I should have been doing all along.

So then I started thinking about learning BDD (after all, I wrote the features for MQTT afterwards) by using it on a D project. So I investigated how I could call D code from my step definitions. After spending the better part of an afternoon playing with Thrift and binding Ruby to D, I decided that the best way to go about this was to implement the Cucumber wire protocol. That way a server would listen to JSON requests from Cucumber, call D functions and everything would work. Brilliant.

I was in for a surprise though, me who’s used to implementing protocols after reading an RFC or two. Instead of a usual protocol definition all I had to go on was… Cucumber features! How meta. So I’d use Cucumber to know how to implement my Cucumber server. A word to anyone wanting to do this in another language: there’s hardly any documentation on how to implement the wire protocol. Whenever I got lost and/or confused I just looked at the C++ implementation for guidance. It was there that I found a git submodule with all of Cucumber’s features. Basically, you need to implement all of the “core” features first (therefore ensuring that step definitions actually work), and only then do you get to implement the protocol server itself.

So I wanted to be able to write Cucumber step definitions in D so I could learn and apply BDD to my next project. As it turned out, I learned BDD implementing the wire protocol itself. It took a while to get the hang of transitioning from writing a step definition to unit testing but I think I’m there now. There might be a lot more Cucumber in my future. I might also implement the entirety of Cucumber’s features in D as well, I’m not sure yet.

My implementation is here.

Tagged , , , , , , , , , ,