Controversial, I know. Let me start by saying this: I think line coverage tools are useful and should be used. But I think most people get a false sense of security by shooting for what I think are meaningless metrics such as achieving x% line coverage.
One of the problems is coupling: good tests aren’t coupled to the implementation code and one should be free to change the implementation completely without breaking any tests. But line coverage is a measurement of supposed test quality that is completely dependent on the implementation! If that doesn’t sound alarm bells, it should. I could replace the implementation and the line coverage would probably change. Did the quality of my tests change? Obviously not.
Another problem I’ve seen is that code coverage metrics cause people to write “unit tests” for utility functions that print out data structures. There are no assertions in the “test” at all, all it does is call the code in question to get a better metric. Is that really providing a stronger guarantee that the software works as intended? Beware the cobra effect, and, as nearly always, Dilbert has something to say about the danger of introducing metrics and encouraging engineers to make them better.
Last week at work I encountered yet another real-life example of how pursuing code coverage by itself can be fruitless endeavour. I wrote a UT for a C function that was something like this:
int func(struct Foo* foo, struct Bar* bar);
So I started out with my valgrind-driven development and ended up filling up both structs with suitable values, and all the test did was assert on the return value. I looked at the line coverage and the function was nearly 100% covered. Great, right? No. The issue is that, in typical C fashion, the return code in this function in particular wasn’t nearly as interesting as the side effects to the passed-in structs, and I hadn’t checked those at all. Despite not having tested that the function actually did what it was supposed to, I had nearly 100% line coverage. By that metric alone my test was great. By the metric of preventing bugs… not so much.
So what is line coverage good for? In my opinion, identifying the gaps in your testing. Do you really care if no test calls your util_print function? Probably not, so seeing that as not covered is ok. Any if statement that isn’t entered (or else clause) however… you probably want to take a look at that. I tend to do TDD myself, so my line coverage is high just because lines of code don’t get written unless there’s an accompanying test. Sometimes I forget to test certain inputs, and the line coverage report lets me know I have to write a few more tests. But depending on it as a metric and seeing higher coverage in and of itself as a goal? That’s not something I believe in. At the end of the day, code with 100% coverage still has bugs. The important thing is to identify techniques for reducing the probability of writing buggy code or introducing them into code that worked.
If you’re interested in a thoughtful analysis of code coverage, I really enjoyed this article.