Wednesday, February 08, 2012

Choosing a sprint length

In interview discussions with candidates I've heard about many different flavours of Scrum, though most teams seem to settle for 2 week sprint durations I've heard of anything up to 10 week sprints (which probably stretches the definition a bit). So how do you choose an ideal sprint length for your project. Firstly, I'd say going beyond 3 weeks probably means you're not going to reap the benefits of the iterative nature of agile - more than three weeks and the danger of gold plating and reverting to incremental development will be strong.

So what factors should you consider when choosing a sprint length?
  1. If the requirements are expected to evolve rapidly from sprint to sprint once the business has had a chance to review progress then choose a shorter sprint - i.e. be more agile
  2. If you think it will be a struggle to groom requirements to have them ready for a sprint choose a shorter sprint duration so you only need one or two weeks of work ready at any one time
  3. If the team is in-experienced with scrum choose a shorter length to gain more frequent practice in the process
  4.  Larger projects may benefit from using a longer sprint length, especially if the majority of requirements are well known
  5. If you think getting committment from the product owner is going to be a problem then the choice of sprint length won't fix this, but I'd suggest choosing a shorter length is probably wise. That way they have shorter but more regular meetings to attend, and you can re-engage with them on a more regular basis.
Ideally the sprint duration should not change during the project – e.g. going from a 2 week sprint for the first few then changing to 3 weeks. Changing the sprint length distorts some of the metrics like sprint velocity and product burndown charts that then require more care to interpret.

Also up for debate is when to start / stop sprints. We've migrated to having sprint boundaries mid week so that lessons learned from reviews and retrospectives are at the forefront of our minds during sprint planning held on the same day or next morning rather than after the weekend has passed.

Monday, February 06, 2012

Code coverage considered harmful

‘Code coverage considered harmful’. I interviewed a developer not that long ago and he said this to me (in more words). If I could remember his name I’d give him the credit. For at least 7 years now we’ve had a continuous integration process in place for all our projects with automated unit tests and code coverage measurements. Since DB changes have always been part of our CI process, and since our unit tests had the ability to roll back DB transactions (I know, so that makes them integration tests right), we’ve always had a desire to have a reasonably high code coverage (over 85%) since there should be no excuse for not having this. Since there was an open source tool to do this (NCover) we started to measure - what harm could it possibly do?

Well, over the years this has become a problem for a number of reasons:
  1. Since the developers know this is being measured and have easy access to the results, they often just find uncovered code and write tests just to cover that code with meaningless asserts – e.g. assert that a class has 8 properties. Or potentially worse, with no asserts at all!
  2. Developers have started covering code which is not actually ever used by the software. It’s some lava flow code that’s been superseded by some new refactoring but not cleaned up. Instead of checking whether or not the code is required a test is created to bring up coverage.
  3. Developers have not bothered using TDD (or BDD) practices – since the tooling can tell them _after_ they’ve written their code which tests are ‘missing’ they can just write tests to cover the code after the fact.
  4. Which also means they are just coding to a design in their head rather than to a business requirement expressed as a TDD test+assert (or a BDD behaviour+should).
  5. Writing tests after the code also just results in them rarely failing since the developer assumes it’s coded correctly; if the test fails they assume the test is wrong rather than the code. They also start using automated assert generation tools – which is pretty scary when you think about it – yes I’ve just confirmed that my code does exactly what it does…. duh
  6. Boundary conditions are ignored. Doesn’t matter that a range condition exists – one test can cover it, even though min-1, min, max and max+1 value should ideally be tested.
There is no business reason why a class should contain 8 properties, there is no business reason for a class to exist at all for that matter. There is no business reason to test code which can never be run in production, there is no business reason to test code so that code coverage is higher. There’s no business reason to generate tests just to satisfy a metric.

What’s the solution? Probably we should stop measuring coverage, but that alone will not fix the issues above, and might be throwing the baby out with the bath water. Would it be better to have only 50% coverage with good, meaningful tests? After all a big chunk of any code is simple and may not benefit from test automation.

The real solution is to start doing TDD or BDD as it’s supposed to be done, and reviewing the tests that are being written – there is still no good substitute for code inspections. At minimum extract all the test case names and put them in front of the business person – if they can understand them, then you’re on the right track. If they ask ‘what’s this stuff about making sure we have a non-empty collection of strings’ you’re probably not.