To reach 100% testing coverage is a dream for many teams. The metric used is code coverage for tests, but is that enough? Unfortunately not. Code line coverage is not the same as functional coverage. And it is full functional coverage that really matters in the end.
Look at a simple method that formats a string.
public static string Format(int? value) { const int defaultValue = 42; if(!value.HasValue) { value = defaultValue; } return "The value is " + defaultValue + "."; } |
There is a correct test for this method, that pass and gives 100% code coverage. Still there is a severe bug in there. Can you spot it? (If you don’t, just keep reading on, it will be obvious when another test is added later.)
The Test
The test gives full code coverage and is indeed a correct test.
[TestMethod] public void Test() { var result = ValueFormatter.Format(null); Assert.AreEqual("The value is 42.", result); } |
This test gives full code coverage because it enters the if statement in the tested method and runs all lines of code. It does not however give full functional coverage because then the two different cases of using a given value and a default value have to be tested independently.
The Correct Tests
One test case is not enough, not even for this small method. Two separate cases are needed.
[TestMethod] public void TestDefaultValue() { var result = ValueFormatter.Format(null); Assert.AreEqual("The value is 42.", result); } [TestMethod] public void TestGivenValue() { var result = ValueFormatter.Format(7); Assert.AreEqual("The value is 7.", result); } |
The TestDefaultValue
test is the same as the previous one, just renamed for clarity. The second test is the missing one. The one testing that the method works correct when the if statement is not entered. It clearly shows the bug. The mistaken use of the defaultValue
constant instead of the value
variable in the string formatting.
Functional Coverage
A simple use of an incorrect identifier created a bug, that wasn’t found by the tests. Even though the code coverage report showed 100% line coverage. The problem was that we lacked functional coverage. If functional coverage is what matters, why are we using code coverage as a metric? The answer is that functional coverage is incredibly hard to measure. It is hard to measure automatically and it is indeed hard to check manually too. Trying to get full functional coverage for a an existing larger code base will lead to a lot of redundant tests that are costly to write and maintain.
So what to do if it’s not feasible to write covering tests to the code?
Turn it the other way around: Make sure that there is a covering test for any added functionality code at the time the code is written. Write the test for the wanted functionality first, make sure the test fails and then write code until it pass. Does that sound familiar? It should, because it is Test Driven Development (TDD).
Test Driven Development
The only way I know of that works to make sure that all functionality is covered by tests, without getting a lot of redundant tests, is TDD. Letting the tests drive the development of functionality in the code will ensure that no functionality is in the code without a test. It will also decrease test redundancy, as it is only meaningful to create a test if it drives any new functionality. In TDD, tests are not written for existing functionality.
To get the full advantage of automated tests they have to be covering, otherwise you will end up doing time consuming manual tests as well. You will avoid refactoring, in fear of breaking something that is not covered by the tests. Unless you are particularly strong, you will probably eventually loose faith in the automated tests and let them rot away, don’t bothering to fix failed tests.
I’m not saying that TDD is simple nor that it is without faults. But, it is the only method I know about that gives full functional coverage and have the tests so tightly integrated in the creation process that tests are guaranteed to be created. Until someone comes up with something better, I’ll continue using and advocating TDD.
“why are we using code coverage as a metric? The answer is that functional coverage is incredibly hard to measure”
No, it’s not. It is standard functionality in sonarqube for java, and included in the default configuration. They call it “branching Coverage” and checks all paths thru your code have been covered, and would immediately flag up your example…
TDD can not ensure that all functionality of your code is covered by tests, because the tests of TDD are written by humans and humans are not perfect. As soon as your code grows and gets re-factored, the situation gets worse.
Branching Coverage can not ensure functionality coverage in any way. Just image tests without asserts. You can get perfect branching coverage, but it just shows the absence of exceptions.
In my humble opinion the only way to make sure that all functionality of your code is tested, is Mutation testing (http://en.wikipedia.org/wiki/Mutation_testing). That’s why I’m working on a mutation testing tool for JavaScript. Currently this is just a experiment. I have the theory that a combination of Mutation Testing and tests on a functional/integration level (no internal “unit” tests which make no sense for non-developers) can be more productive and give more security than TDD in many situations. My proposed “XDD” is:
1. Write Code to implement a feature/story (Use TDD only if it’s really faster, makes you really happy and the code is really cleaner without test induced complexity.)
2. Run Mutation Tests
3. If untested code is found: Add tests and go to 2.
4. If code looks still dirty: Refactor, keep tests green, go to 2
5. Your have perfectly tested code.
What do you think about his approach?
SeaLights can help you measure your functional test code coverage, take a look >> https://www.sealights.io/