Is it relevant to have a code coverage target? In a talk at NDC Oslo 2014 Uncle Bob said that the only reasonable goal is 100%. On the other hand Mark Seemann recently said on twitter and in a follow up blog post that “I thought it was common knowledge that nothing good comes from code coverage targets”. Those are two seemingly opposing views.
Before looking at the role of code coverage, I’d like to take a few steps back and look at the objectives of testing.
When working properly with test driven development, no production code is written unless there is a failing test driving the change. As soon as the test pass, no addition of production code is allowed until there is again a failing test. If that principle is followed, there will simply be no production code that is not covered by a test.
My objective is to achieve high code quality.
My way of achieving that is TDD.
An effect of that is that I do get full code coverage.
Logical implication
The reasoning above is an example of logical implication: P → Q
. Let P
= “high code quality” and Q
= “high code coverage”. Then P → Q
. In plain English it means that If P
is true, Q
has to be true too. In what cases can we draw a conclusion on the value of P
if we know the value of Q
? In exactly one case: If Q
is false, then P
also must be false. If Q
is true, we can’t say anything about the value of P
.
In propositional logic, it is well known that an implication does not work backwards. In the case of code coverage and testing, it is well known that high code coverage is not the same as functional coverage or high quality. I’ve written about it and Martin Fowler shares a real horror story in his article on assertion free testing.
How to Interpret Code Coverage
Code coverage can be deceiving by giving us the comfort of a good metric without actually guaranteeing anything. But that doesn’t mean that it’s useless. The negation of an implication works backwards. If P → Q
and we want to ensure P
is true, we first have to ensure that Q
is true. So if the code coverage numbers drops, we can safely draw the conclusion that the functional coverage is lower too. That’s why I still think that 100% is the only reasonable code coverage target. That’s what I answered when Uncle Bob asked if anyone in the room had a coverage goal.
100% is that realistic? Hell no, noone can get a 100% coverage. But, that’s the only meaningful goal. If your goal is 80%, then what you are saying is that it’s okay if 20% of the code doesn’t work. Your goal has to be a 100%, even though you cannot hit this goal, it’s an asymptotic goal.
Refactoring and Redundant Code
There are two more cases where I think that code coverage can really help. One is during refactoring, when it can help ensuring that no extra functionality is accidentally added. When doing major refactoring, I think that is very easy to accidentally introduce some extra feature. It could be a small error check or it could be a minor tweak to an algorithm. Keeping track of code coverage will clearly show if such an addition has been done.
There is also the opposite case, where refactoring moves functionality to a new place so that some old handling of the case is now unreachable code. That will also be detected by code coverage.
Deliberate Exclusion
There is however one obstacle in the way for using code coverage as the detection mechanism for those cases. We need a way to make it immediately obvious that there is now a redundant or untested line of code somewhere. We need to make that line stand out. We need to make sure that it is the one line that breaks the perfect 100.0% code coverage score.
But, there are always things that cannot reasonably be tested. For example in Kentor.AuthServices there are several parts of the HttpModule API that are extremely hard to test because of how the IIS API is designed. That code is mostly plumbing code, with very few changes. There are very few conditional branches in the code so a few simple high level integration tests will find any problems. I’ve come to the conclusion that it’s not worth the effort to write unit tests for those. But I still want the 100.0% code coverage score.
This is actually how the entire twitter thread by Mark Seemann started.
TIL that there's an [ExcludeFromCodeCoverage] attribute in .NET https://t.co/W9Ihg3yQj9
…but WHY?!
— Mark Seemann (@ploeh) November 12, 2015
I use the [ExcludeFromCodeCoverage]
attribute to make deliberate deviations from the full-code-coverage standard. All code can’t be tested at a reasonable cost. So a full 100.0% code coverage target is not possible. Somewhere between 80% and 100% is realistic. But I want those exclusions to be deliberate. If I accepted a 87.4% coverage, there would be no way that I could find if I accidentally missed coverage on another line (one that mattered). It’s the same as compiler warnings: the only policy that works in the long term is 0 warnings. Once in a while there are warnings that are safe to ignore – that’s where we use a #pragma
to suppress the warning in that specific place. [ExcludeFromCodeCoverage]
is the corresponding mechanism for code coverage.
In the end I don’t think that I, Mark Seemann or Uncle Bob are that far apart in our view on TDD. The code coverage is just a means to reach the real goal: high code quality. There is no value in a high code coverage number on its own.
Some of our PHBs are also obsessed with code coverage percentages. This leads often to Tests of bad quality. Tests that are written, so that the coverage is high. Those tests often to not verify the result enough. We try to target this with mutation testing. The mutation test framework adds changes to your code under test and re-runs the tests. If the test still passes, the test is missing verification of the result. If the test fails, the mutation has been discovered by the test. After the mutation test run, you get a report how many mutations were “killed” (discovered by failing tests). I think a high code coverage with a high “kill rate” in the mutation tests gives a much better view of the code quality.
I had no idea that such a thing as a mutation test framework existed. It sounds like a really great automatic approach to validate tests. I do agree with you that tests that are written just to get high coverage tend to lack the proper assertions. Proper use of TDD would mitigate this. With TDD any line production code written should be to make a test pass.
When I do TDD I do try to break my own tests as often as I can. I first write a test that describes the functionality. I then go on and try to figure out a way to make the test pass with an incorrect implementation. That means I have to go back and improve the test, or write another test case to cover a separate angle of the problem. Finally I get “stuck” at the point where the only thing I can do to make the tests pass is what I think is the correct implementation. That’s when I’m done.