Controversial, I know. Let me start by saying this: I think line coverage tools are useful and should be used. But I think most people get a false sense of security by shooting for what I think are meaningless metrics such as achieving x% line coverage.
One of the problems is coupling: good tests aren’t coupled to the implementation code and one should be free to change the implementation completely without breaking any tests. But line coverage is a measurement of supposed test quality that is completely dependent on the implementation! If that doesn’t sound alarm bells, it should. I could replace the implementation and the line coverage would probably change. Did the quality of my tests change? Obviously not.
Another problem I’ve seen is that code coverage metrics cause people to write “unit tests” for utility functions that print out data structures. There are no assertions in the “test” at all, all it does is call the code in question to get a better metric. Is that really providing a stronger guarantee that the software works as intended? Beware the cobra effect, and, as nearly always, Dilbert has something to say about the danger of introducing metrics and encouraging engineers to make them better.
Last week at work I encountered yet another real-life example of how pursuing code coverage by itself can be fruitless endeavour. I wrote a UT for a C function that was something like this:
int func(struct Foo* foo, struct Bar* bar);
So I started out with my valgrind-driven development and ended up filling up both structs with suitable values, and all the test did was assert on the return value. I looked at the line coverage and the function was nearly 100% covered. Great, right? No. The issue is that, in typical C fashion, the return code in this function in particular wasn’t nearly as interesting as the side effects to the passed-in structs, and I hadn’t checked those at all. Despite not having tested that the function actually did what it was supposed to, I had nearly 100% line coverage. By that metric alone my test was great. By the metric of preventing bugs… not so much.
So what is line coverage good for? In my opinion, identifying the gaps in your testing. Do you really care if no test calls your util_print function? Probably not, so seeing that as not covered is ok. Any if statement that isn’t entered (or else clause) however… you probably want to take a look at that. I tend to do TDD myself, so my line coverage is high just because lines of code don’t get written unless there’s an accompanying test. Sometimes I forget to test certain inputs, and the line coverage report lets me know I have to write a few more tests. But depending on it as a metric and seeing higher coverage in and of itself as a goal? That’s not something I believe in. At the end of the day, code with 100% coverage still has bugs. The important thing is to identify techniques for reducing the probability of writing buggy code or introducing them into code that worked.
If you’re interested in a thoughtful analysis of code coverage, I really enjoyed this article.
“Did the quality of my tests change? Obviously not.”
I would say that they did. The tests no longer exercise as much of the code. It’s not the tests “fault”, but they aren’t as good at their job as they were.
Test quality is only a meaningful idea as a coverage of either the possible behaviours of the code or the specified behaviour of the code. If a smaller percentage of the lines of code are covered, it’s a reasonable leap to assume that less of the possible behaviour is covered. Ideally the code will only have behaviours from the spec, but that’s something you want your test suite to be checking (as much as possible)…
Line coverage is an interesting metric. Like many metrics, if you chase it for its own sake, you end up getting developers who are very good at making that number go up, as opposed to developers who are good at testing. I’ve seen codebases with 100% test coverage with worse tests than other codebases with 60% test coverage.
Firstly: it’s possible to cover all the lines without covering all the cases. The degenerate example is the test with no asserts: it calls all the methods but doesn’t verify the return values. Sometimes it’s a good developer having a brain fart. Sometimes it’s a bad developer making the number go up without thinking about why. And that’s just the degenerate example – there are plenty of cases where 100% code coverage does not imply 100% coverage of business cases.
Secondly, where are we getting this metric from? Instrumentation of code whilst running unit tests? There are aspects of my code which I get no value from unit testing – for an extreme example, getters and setters on POJOs – but are nonetheless properly exercised by other levels of testing, be it integration tests, acceptance tests etc. Having to replicate testing throughout different layers has a cost.
Thirdly, what are the costs of writing tests? It’s not just the cost of writing them – too many tests make code more expensive to change. The more comprehensive your tests, the higher the cost of keeping them up to date. That’s a good thing when those tests add value. It’s possible to write unit tests which don’t add value, and aiming for 100% unit test coverage pushes you in that direction.
One of my favourite interview questions is: how do you know when you have enough tests? The answer is most definitely not any number on a code coverage metric – even if that answer is 100% tests. The follow-up question of “can you have too many tests?” should hopefully lead people to the cost-benefit analysis thought process.
So, what value is coverage as a metric? Where it really comes into its own is as a comparative metric: when you change code, what code have you added which is now untested? What have you accidentally overlooked?
That’s data which needs to be interpreted by an actual brain which is capable of judging how complex that particular branch of logic is.
But then, dealing with data which needs to be interpreted by an actual brain which is capable of judging – that’s our job, isn’t it?
[…] Measure test coverage. Look at the reports. Make informed decisions on what to do next. I’ve also written about this before. […]