Monday, February 06, 2012

Code coverage considered harmful

‘Code coverage considered harmful’. I interviewed a developer not that long ago and he said this to me (in more words). If I could remember his name I’d give him the credit. For at least 7 years now we’ve had a continuous integration process in place for all our projects with automated unit tests and code coverage measurements. Since DB changes have always been part of our CI process, and since our unit tests had the ability to roll back DB transactions (I know, so that makes them integration tests right), we’ve always had a desire to have a reasonably high code coverage (over 85%) since there should be no excuse for not having this. Since there was an open source tool to do this (NCover) we started to measure - what harm could it possibly do?

Well, over the years this has become a problem for a number of reasons:
  1. Since the developers know this is being measured and have easy access to the results, they often just find uncovered code and write tests just to cover that code with meaningless asserts – e.g. assert that a class has 8 properties. Or potentially worse, with no asserts at all!
  2. Developers have started covering code which is not actually ever used by the software. It’s some lava flow code that’s been superseded by some new refactoring but not cleaned up. Instead of checking whether or not the code is required a test is created to bring up coverage.
  3. Developers have not bothered using TDD (or BDD) practices – since the tooling can tell them _after_ they’ve written their code which tests are ‘missing’ they can just write tests to cover the code after the fact.
  4. Which also means they are just coding to a design in their head rather than to a business requirement expressed as a TDD test+assert (or a BDD behaviour+should).
  5. Writing tests after the code also just results in them rarely failing since the developer assumes it’s coded correctly; if the test fails they assume the test is wrong rather than the code. They also start using automated assert generation tools – which is pretty scary when you think about it – yes I’ve just confirmed that my code does exactly what it does…. duh
  6. Boundary conditions are ignored. Doesn’t matter that a range condition exists – one test can cover it, even though min-1, min, max and max+1 value should ideally be tested.
There is no business reason why a class should contain 8 properties, there is no business reason for a class to exist at all for that matter. There is no business reason to test code which can never be run in production, there is no business reason to test code so that code coverage is higher. There’s no business reason to generate tests just to satisfy a metric.

What’s the solution? Probably we should stop measuring coverage, but that alone will not fix the issues above, and might be throwing the baby out with the bath water. Would it be better to have only 50% coverage with good, meaningful tests? After all a big chunk of any code is simple and may not benefit from test automation.

The real solution is to start doing TDD or BDD as it’s supposed to be done, and reviewing the tests that are being written – there is still no good substitute for code inspections. At minimum extract all the test case names and put them in front of the business person – if they can understand them, then you’re on the right track. If they ask ‘what’s this stuff about making sure we have a non-empty collection of strings’ you’re probably not.


Anonymous said...

I quite like the idea of having tests execute and report back in plain english so the domain expert can understand it:

Christian Baumann said...

One important part is missing: If complete branches of code are missing, this will never be detected by any code coverage tool at all.

Anonymous said...

This is the "targets considered harmful" argument. Targets should be an aid toward achieving known and specified aims. What happens is that the targets *become* the aims, as described.

Yi Wen said...

Yes the situation being mentioned in this post happens in a lot of teams. But do these really test coverage's fault alone? If they don't use a test coverage tool will they start using TDD, write good tests? the answer is probably a big no no.

With a test coverage tool, at least they start to write tests, and hopefully they can start to think how to write good tests if they are any good.

Do not blame a tool for failure when really, this is all about people who use the tool.

Fernando said...

"I’d rather have few good tests than to have tons of bad tests which “cover” (no pun intended) the truth: that we have improperly tested code.": Me on unit testing and coverage

Anonymous said...

So what? This is the classic metric problem.

If you start to be lazy and try to track people using a metric, then people will game the metric. Whatever the metric is. They are not stupid robots. People are intelligent, and if you thing that by default they will devote all the intelligence to serve you, you are really naive.

You should focus on what is really important and you should really check what is going on in the company.

And oh, TDD is not universal. It is a specific methodology that is popular mostly on blogs at the moment.

Giacomo Tesio said...

Stupidity is quite harmful, too.

Fast fix for both: hire better devs.

Monty Goodepuppee said...

Mutation testing is a great way of addressing some of these issues. Tools like Jester will randomly make changes to your code; e.g. changing true to false, etc. Any unit test that continues to pass after that may be considered suspect.

Robert Pieper said...

I agree with Yi Wen...

1) 100% code coverage is not something on which a developer's pay should be based. It is one of many tools to ensure code quality. it is not THE tool to measure code quality.

2) 100% coverage can be obtained by covering only happy paths and not asserting anything... but is there not at least some tiny value in exercising the code?

3) Not writing an assert for a test is silly. Your developers are probably thieves too. Get new ones.

4) if you are not writing to 100%, which lines of code are worth writing, worth putting into production, worth maintaining, but not worth keeping under some baseline happy path testing?

5) if you always had done TDD, 100% coverage is a moot would already have 100%

6) Add tests as you maintain the code, not to gold plate the code. If you already have to go in to make a change, cover its existing state in tests so you know if you broke the existing functionality. You should be able to assume production code already works if the users have tested it in production.

7) really? you have dead code in production and you know about it? No one would write tests knowingly for code that should be deleted. Just delete the useless code.

8) I could go on all day

Oguzhan Acargil said...
This comment has been removed by the author.
Graham Pearson said...

Thanks for the comments.

Maybe I've overstated the problems we're facing. The vast majority of our tests are good, isolated, repeatable, independent, non-brittle tests and this is reflected in the quality of the systems we've developed and maintained. In fact the best developers on the team probably never look at code coverage anymore.

So the real question (and the title of the post) is whether measuring code coverage can do more harm than good. On the one hand it's good to know what's been covered and gives some 'sense' of the state of the test suite, but on the other hand as has been mentioned, knowing someone is looking at a metric changes your behaviour to met the metric. Does one outweigh the other?

Or as I suggested at the end of the post use coverage simply as an aid to other processes like code reviews (let's review the tests before the code and se if I can understand what the code is supposed to be doing by what tests are being run), and test-first approaches (coverage should be 100% if we're practicing this so it's an easy intial 'confirmation').

Giacomo said...

Stupidity considered harmful

Robert Pieper said...

in response to my prior comment and the author's current comment... To me, code coverage is not too different than a speedometer. It is used to know something about your current state. Am I going the speed limit? Do I have all my happy paths under test?

Sure, you can break the rules and do what you have to do to see the number you want to see, but at the end of the day the truth is the truth: you are breaking the law or you are not; you have all your logic under (at least) basic tests or you do not.

I wrote a little something on the topic. Feel free to poke holes in it since I poked holes in yours. I take a stance in a heated, rat-hole of a conversation which I feel needed to be represented.


Robert Pieper said...

forgot the link


Robert Pieper said...

tanusoft said...

First "quick fix" for your "ugly legacy code" problem : read the documentation of your coverage tool and find out how to configure it to ignore some parts of your source code if you know that you don't want tests on that part.

Or better yet, if that legacy code is now useless due to newer code that has superseded it, shouldn't it be removed from the project ?

2nd fix, but less quick... Do code reviews on the unit-tests. Code coverage metrics are not a substitute for human interaction. If you feel you really need metrics for communicating your needs, for example if there can be a long time between finding some class to improve and the moment it is fixed, and you feel that you need to keep that information so that it doesn't get forgotten, look for some way to keep such annotations in the code that will reflect through the metrics tools you are using.

Alan Abraham said...

If you are looking for a Software Development Company in Delhi you can check this site

Trade Credit Risk Pyt Ltd said...

This is a very informative article. Hope to read more from you... thank You so much for sharing this great experience with us...

Insurance Brokers Melbourne Victoria