Line coverage isn’t as important as most people think

Controversial, I know. Let me start by saying this: I think line coverage tools are useful and should be used. But I think most people get a false sense of security by shooting for what I think are meaningless metrics such as achieving x% line coverage.

One of the problems is coupling: good tests aren’t coupled to the implementation code and one should be free to change the implementation completely without breaking any tests. But line coverage is a measurement of supposed test quality that is completely dependent on the implementation! If that doesn’t sound alarm bells, it should. I could replace the implementation and the line coverage would probably change. Did the quality of my tests change? Obviously not.

Another problem I’ve seen is that code coverage metrics cause people to write “unit tests” for utility functions that print out data structures. There are no assertions in the “test” at all, all it does is call the code in question to get a better metric. Is that really providing a stronger guarantee that the software works as intended? Beware the cobra effect, and, as nearly always, Dilbert has something to say about the danger of introducing metrics and encouraging engineers to make them better.

Last week at work I encountered yet another real-life example of how pursuing code coverage by itself can be fruitless endeavour. I wrote a UT for a C function that was something like this:

int func(struct Foo* foo, struct Bar* bar);

So I started out with my valgrind-driven development and ended up filling up both structs with suitable values, and all the test did was assert on the return value. I looked at the line coverage and the function was nearly 100% covered. Great, right? No. The issue is that, in typical C fashion, the return code in this function in particular wasn’t nearly as interesting as the side effects to the passed-in structs, and I hadn’t checked those at all. Despite not having tested that the function actually did what it was supposed to, I had nearly 100% line coverage. By that metric alone my test was great. By the metric of preventing bugs… not so much.

So what is line coverage good for? In my opinion, identifying the gaps in your testing. Do you really care if no test calls your util_print function? Probably not, so seeing that as not covered is ok. Any if statement that isn’t entered (or else clause) however… you probably want to take a look at that. I tend to do TDD myself, so my line coverage is high just because lines of code don’t get written unless there’s an accompanying test. Sometimes I forget to test certain inputs, and the line coverage report lets me know I have to write a few more tests. But depending on it as a metric and seeing higher coverage in and of itself as a goal? That’s not something I believe in. At the end of the day, code with 100% coverage still has bugs. The important thing is to identify techniques for reducing the probability of writing buggy code or introducing them into code that worked.

If you’re interested in a thoughtful analysis of code coverage, I really enjoyed this article.

Binary serialisation made even easier: no boilerplate with Cerealed

I abhor boilerplate code and duplication. My main governing principle in programming is DRY. I tend not to like languages that made me repeat myself a lot.

So when I wrote a serialisation library, I used it in real-life programs and made sure I didn’t have to repeat myself as I had to when I wrote similar code in C++. The fact that D allows me to get the compiler to write so much code for me, easily, is probably the main reason why it’s my favourite language.

But there were still patterns of emerging in the networking code I wrote that started to annoy me. If I’m reaching for nearly identical code for completely different networking protocol packets, then there’s a level of abstraction I should be using but failed to notice beforehand. The patterns are very common in networking, and that’s when part of the packet contains either the total length in bytes, or length minus a header size, or “number of blocks to follow”. This happens even in something as simple as UDP. With the previous version of Cerealed, you’d have to do this:

struct UdpPacket {
    ushort srcPort;
    ushort dstPort;
    ushort length;
    ushort checksum;
    @NoCereal ubyte[] data; //won't be automatically (de)serialised

    //this function is run automatically at the end of (de)serialisation
    void postBlit(C)(ref C cereal) if(isCereal!C) {
        int headerSize = srcPort.sizeOf + dstPort.sizeof + length.sizeof + checksum.sizeof;
        cereal.grainLengthedArray(data, length - headerSize); //(de)serialise it now
    }
}

Which, quite frankly, is much better than what usually needs to be done in most other languages / frameworks. This was still too boilerplatey for me and got got old fast. So now it’s:

struct UdpPacket {
    //there's an easier way to calculate headerSize but explaining how'd make the blog post longer
    enum headerSize = srcPort.sizeOf + dstPort.sizeof + length.sizeof + checksum.sizeof; 
    ushort srcPort;
    ushort dstPort;
    ushort length; 
    ushort checksum; 
    @LengthInBytes("length - headerSize") ubyte[] data; 
    //code? We don't need code 
}

That “length – headerSize” in @LengthInBytes? That’s not a generic name, that’s a compile-time string that refers to the member variable and manifest constant (enum) declared in the struct. The code to actually do the necessary logic is generated at compile-time from the struct declaration itself.

Why write code when I can get the compiler to do it for me? Now try doing that at compile-time in any other language! Well, except Lisp. The answer to “can I do ___ in Lisp” is invariably yes. And yet, not always this easily.

Cerealed is on github and is available as a dub package.

CppCon 2015

I submitted a talk proposal to CppCon for this year’s conference but unfortunately I won’t be speaking. According to the reviewers, my abstract was too vague so I guess I’ll have to detail everything next time around.

I was going to talk about how to leverage C++14 for unit-testing legacy C code. You could use C to unit-test C code, but C++’s near-perfect backwards compatibility and its many, many features make writing the test code a lot easier.

I’m going to try to go to the conference anyway. Maybe I’ll get a lightning talk in presenting my Emacs package for C and C++ development.

The importance of setting a good example

Most developers will pay just enough attention and do just enough work to get their current task done. To that end, and to reduce mental burden, whatever patterns the current codebase does are copied/extended/emulated. After all, most development work consists of changing existing code to do something else as well as whatever it did before.

As I mentioned to a colleague once, “code cruft happens little by little, never in one fell swoop”.

So it’s important to set a good example by not doing things the hacky way. Or, at least, by not leaving the code hacky by the time the next developer looks at it. The other day at work I implemented a proof-of-concept by modifying my carefully designed pristine code and, because it was easier, added a global variable to get the job done. It was a quick hack, I knew it was a bad idea and I could always fix it later, right?

Fast-forward a few weeks and 2 other developers had been working on this code. Now we had a dozen global variables. The two other devs aren’t bad at what they do; but having had a global there in the first place caused new ones to grow. Bad ideas are infectious.

So be sure to try and do things the right way. Other people will copy the patterns you write, whether good or bad.

My DConf 2015 talk: Behaviour-Driven Development with D and Cucumber

Here’s my DConf 2015 on testing and BDD for D applications using Cucumber:

Behaviour-Driven Development with D and Cucumber

As usual, I talked a lot faster than normal and finished with plenty of time to go. I think it went well though, my super-tired-eyes-because-of-jetlag notwithstanding.

Some little-known reasons why D makes day-to-day development easier

When discussing programming languages, most people focus on the big, sexy features: native compilation; functional programming; concurrency. And it makes sense, for the most part these are the features that distinguish programming languages from one another. But the small, little-known features can make regular day-to-day coding a lot easier.

My favourite language is D, as I’ve mentioned before. But languages aren’t like football teams; I’m not a D fan because my dad is or anything like that, I like writing code in the language because I feel I’m more productive. There are several reasons for that, but today I want to talk about the unsexy little things that help me crank out working code faster.

writeln: yes, it’s basically printf. But it’s actually so much more. In most languages, printing out built-in types is easy. “printf(“%d”, intvar)” in C or “cout << intvar” in C++. But you want to actually print one of your own types… you have to write code. Lots of it. Not so in D, the defaults just work. The only real exception are classes, for which you need to override toString. But idiomatic D code doesn’t use many of those, preferring structs. The other good thing about types being easy to print is that you can paste them in to another D source files and compile it. I’ve had to do that a few times.

enums: other languages have enums, they don’t even look very different in D:

enum MyEnum {
    foo,
    bar,
    baz,
}

So how do they help me more than in other languages? First of all, I’ll refer you back to my writeln point: when you print them out you get their string representation, not a number. Internally they’re really just a number like in C, but you never really have to care or worry.

Secondly, “final switch” makes enums a lot more useful by making sure you deal with every single enumeration. Add another value? Your “final switch” code will break, as it should. For those not following the link, in D a final switch on an enum value means that the code will fail to compile unless all enum values have a case statement.

getopt: Defined in std.getopt, it does the very unsexy task of parsing command-line options, but it does it so well. This is the kind of thing that templates allow you to do. In this code:

 int intopt;
 double doubleopt;
 MyEnum enumopt;
 string[] strings;
 auto optInfo = getopt(
     args,
     "i|int", "Int option", &intopt,
     "d|double", "Double option", &doubleopt,
     "e|enum", "Enum option", &enumopt,
     "s|strings", "string list option", &strings,
     );
 if(optInfo.helpWanted) {
     defaultGetoptPrinter("myapp", optInfo.options);
 }

Will do what you expect it to do, and maybe more. It automatically converts the strings given to the program at run-time to the types of the variables that store them. Even better, look at the -s option. If the program is then called with -s string1 -s string2, it’ll hold those two string values. It really is as easy as it looks.

struct constructors: or the lack thereof. Basically, if you write this:

struct Foo {
    int i;
    string s;
    void func() {} //just so it isn't a POD
}

auto foo = Foo(4, "foo");

It’ll compile and work. You can define struct constructors if you want, but it’s the bog-standard one, you don’t have to.

I’m sure there are other lesser-known features in D that make real life programming easier, but these are the ones I can think of of right now.

Valgrind-driven development

At work I’m part of a team that’s responsible for maintaining a C codebase we didn’t write. This is not an uncommon occurrence in real-life software development, and it means that we don’t really know “our” codebase all that well. To make matters worse, there were no unit tests before we took over the project, so knowing what any function is supposed to do is… challenging.

So what’s a coder to do when confronted with a bug? I’ve come to use a technique I call valgrind-driven development. For those not in the know (i.e. not C or C++ programmers), valgrind is a tool that, amongst other things, lets you precisely find out where in the code memory is leaking, conditional jumps are done based on uninitialised values, etc., etc. The usefulness of valgrind and the address sanitizers in clang and gcc cannot be overstated – but what does that have to do with the problem at hand? Say you have this C function:

int do_stuff(struct Foo* foo, int i, struct Bar* bar);

I have no idea how this is supposed to work. The first thing I do? This (using Google Test and writing tests in C++14):

EXPECT_EQ(0, do_stuff(nullptr, 0, nullptr);

99.9% of the time this won’t work. But the reasons why aren’t documented anywhere. Usually passing null pointers is assumed to not happen. So the program blows up, and I add this to the function under test:

int do_stuff(struct Foo* foo, int i, struct Bar* bar) {
    assert(foo != NULL);
    assert(bar != NULL);
    //...

At least now the assumption is documented as assertions, and the next developer who comes along will know that it’s a violation of the contract to pass in NULLs. And, of course, that this function is only used internally. Functions at library boundaries have to check their arguments and fail nicely.

After the assertions have been added, the unit test will look like this:

Foo foo{};
Bar bar{};
EXPECT_EQ(0, do_stuff(&foo, 0, &bar);

This will usually blow up spectacularly as well, but now I have valgrind to tell me why. I carry on like this until there are no more valgrind errors, at which point I can actually start testing the functionality at hand. Along the way I’ve built up all the expected dependencies of the function under test, making explicit what was once implicit in the code. More often than not, I usually find other bugs lurking in the code just by trying to document, via unit tests, what the current behaviour is.

If you have to maintain C/C++ code you didn’t write, give valgrind-driven development a try.

Is there a thing as too many types?

I’m writing a build system at the moment. As it turns out, even the most mundane builds end up having to specify a lot of options. Imagine I want to make life easy for the end-user (and of course I do) and instead of specifying all the files to be built, we just specify directories instead. In pseudo-code:

myApp = build(["path/to/srcs", "other/path/srcs"])

But, for reasons unbeknownt to me as well as a few good ones, it’s often the case that there are files in those directories that aren’t supposed to be built. Or extra files lying around that aren’t in those directories but that do need to be built. And then there’s compiler flags and include directories, so a C++ build might look like this:

myApp = buildC++("name_of_binary",
                 ["path/to/srcs", "other/path/srcs"],
                 ["extrafile.cpp", "otherextra.cpp"],
                 "-g -O0",
                 ["include_from_here", "other_include_dir", "yet_another"],
                 ["badfile.cpp", "otherbadfile.cpp"])

If you’re anything like me you’ll find this confusing. All the parameters (and there are several) are either strings or lists of strings. The compiler flags were passed in as a string so they stand out a bit, but the API could have chosen yet another list and it’s hard to say what is what. This didn’t seem like an API I wanted to use so I definitely wouldn’t expect anyone else to want to use it either.

Then the mental warping caused by me learning Haskell kicked in. What if… I add types to everything? They won’t even really do anything, they’re just there to tag what’s what and cause a compilation error if the wrong one is used. So I added a bunch of wrapper types instead, and in actual D code it started to look like this:

alias cObjs = cObjects!(SrcDirs(["etc/c/zlib"],
                        Flags("-m64 -fPIC -O3"),
                        IncludePaths(["dir1, "dir2"]),
                        SrcFiles("extra_src.c"),
                        ExcludeFiles(["badfile.c", "otherbadfile.c"]);

Better? I think so! You can’t confuse what’s what, and even better, neither can the compiler. After I exposed this to other people, it was pointed out to me that there are many ways to select files and that the API as it was would have to have even more parameters to satisfy all needs. It was big enough as it is so I changed it to something that looks like this now:

alias objs = targetsFromSources!(Sources!(Dirs(["src"]),
                                          Files(["first.d", "second.d"]),
                                          Filter!(a => a != "badfile.d")),
                                 Flags("-g -debug -cov -unitest"),
                                 ImportPaths(["dir1", "dir2"]),
                                 StringImportPaths(["dir1", "dir2"]));

Better? I think so. What do you think?

The Loopers and the Algies

In my opinion there are two broad categories of programmers: the Loopers and the Algies.

The Algies hate duplication. Maybe above all else. Boilerplate is anathema to them, common patterns should be refactored so the parts that look almost-the-same-but-not-quite end up calling the same code. When Algies write C++, they use <algorithm>. They use itertools and functools in Python. They tend to like programming languages with first-class functions, that let you abstract, that let you collapse code. They loathe writing the same code over and over again.

Loopers don’t mind repetition as much. Their name comes from the way they write code: for, while, do/while loops and their equivalents everywhere. They end up writing the same loops over and over again, but they don’t mind. It’s not that they’re bad programmers – it’s just that all the things the Algies like? They think it makes their code more complex. Harder to understand. I actually saw one of them say during code review that this:

stuff = []
for x in xs:
    if condition(x):
        stuff.append(x)

Was simpler than this:

    stuff = [ x for x in xs if condition(x) ]

That’s how their brain works. A loop is… simpler. If it’s not evident by now, I’m an Algie through and through. Somebody asked me one day why I didn’t like C and the first thing that came to my mind was “C makes me repeat myself”.

It’s really hard to get people to agree on what good code actually is. I think most of us agree that we want our software to be maintaiable. Readable. But we all have different ideas on what each of those words mean. Take complexity for instance: we want less of it right? Well, that’s why the Loopers like for loops, in their opinion that reduces complexity. map, filter and reduce are complicated to them, whilst to me, they’re my bread and butter. Not only that, it’s my honest opinion that using algorithms reduces overall complexity. Fewer moving parts. Fewer things to reason about. Fewer bugs.

I think many programming language wars are mostly about people from very different philosophies arguing about things that are important to them that the other side just doesn’t understand. Go, for example, is quite clearly a language for Loopers. They don’t need generics, that would complicate the language. The moment I realised Go wasn’t for me was when I realised that without generics, there could be no Go equivalent of <algorithm> and that I’d have to write for loops. And I really don’t like writing for loops. You shouldn’t either. Unless you’re a Looper of course, and to each his own.

DConf 2015 has come and gone

It was a blast, just like last year was. It really is gratifying to spend 3 days discussing one’s favourite programming language with several people who really know their craft. Of course, attending the presentations and being able to ask questions live isn’t a bad deal either.

I think the highlight for me was Andrei Alexandrescu’s provocatively titled “Generic Programming Must Go”, in which he describes a new way of designing code based on compile-time reflection. He apparently came up with it while designing the new allocators for the D standard library, and I highly recommend watching it (and the other videos!) when they’re online.

I’m actually getting to write D code back at work as well, so that’s always good. In my spare time I’m furiously working on a meta build system, which will definitely feature in a future blog post.