Extensive post release testing is sign of an unhealthy testing process

Does your organization conduct extensive post-release testing in production environments?

If you do, then it shows you probably have an unhealthy testing process, and you’ve fallen into the “let’s just test it in production” trap.

If testing in non-production environments was reflective of production behaviour, there would be no need to do production testing at all. But often testing isn’t reflective of real production behaviour, so we test in production to mitigate the risk of things going wrong.

It’s also the case that often issues are found in a QA environment don’t appear in a local development environment.

But it makes much more sense to test in an environment as close to where the code was written as possible: it’s much cheaper, easier and more efficient to find and fix bugs early.

For example, say you were testing a feature and how it behaves across numerous times of day across numerous time zones. As you progress through different test environments this becomes increasingly difficult to test:

In a local development environment: you could fake the time and timezone to see how your application behaves.
In a CI or QA environment: you could change a single server time and restart your application to see how your application behaves under various time scenarios: not as easy as ‘faking’ the time locally but still fairly easy to do.
In a pre-production environment: you’ll probably have clustered web servers so you’ll be looking at changing something like 6 or 8 server times to test this feature. Plus it will effect anyone else utilizing this system.
In a production environment: you’ll need to wait until the actual time to test the feature as you won’t be able to change the server times in production.

Clearly it’s cheaper, easier and more efficient to test changing times in an environment closer to where the code was written.

You should aim to conduct as much testing as you can in earlier test environments and taper this off so by the time you can a change into production you’ll be confident that it’s been tested comprehensively. This probably requires some change to your testing process though.

Tests Performed per Environment

How to Remedy A ‘Test in Production’ Culture

As soon as you find an issue in a later environment, ask why wasn’t this found in an earlier environment? Ultimately ask: why can’t we reproduce this in a local environment?

Some Hypothetical Examples

Example One: our tests fail in CI because of JavaScript errors that don’t reproduce on a local development environment. Looking into this we realize this is because the JavaScript is minified in CI but not in a local development environment. We make a change to enable local development environments to run tests in minified mode which reproduces these issues.

Example Two: our tests failed in pre-production that didn’t fail in QA because pre-production has a regular back up of the production database whereas QA often gets very out of date. We schedule a task to periodically restore the QA database from a production snapshot to ensure the data is reflective.

Example Three: our tests failed in production that didn’t fail in pre-production as email wasn’t being sent in production and we couldn’t test it in pre-production/QA as we didn’t want to accidentally send real emails. We configure our QA environments to send emails, but only to a white-list of specified email addresses we use for testing to stop accidental emails. We can be confident that changes to emails are tested in QA.

Summary

It’s easy to fall into a trap of just testing things in production even though it’s much more difficult and risky: things often go wrong with real data, the consequences are more severe and it’s generally more difficult to comprehensively test in production as you can’t change or fake things as easily.

Instead of just accepting “we’ll test it in production”, try instead to ask, “how can we test this much earlier whilst being confident our changes are reflective of actual behaviour?”

You’ll be much less stressed, your testing will be much more efficient and effective, and you’ll have a healthier testing process.

Fixing bugs in production: is it that expensive any more?

You’ve most likely seen a variant of this chart before:

bug fix costs

I hadn’t seen it for a while, until yesterday, but it’s an old favourite of test managers/test consultants to justify a lot of testing before releasing to production.

But I question whether it’s that accurate anymore.

Sure, in the good old days of having a production release once or twice a year it cost a large order of magnitude more to fix a bug in production, but does it really cost that much more in the present age of continuous delivery/continuous deployment where we release into production every fortnight/week/day?

If the timeline on the chart above is a year then of course bugs will cost more to fix, because presumably, if the project took a year to start with, you don’t have a very rapid software development process. And there’s more likely to be requirements ‘bugs’ in production because an awful lot happened in the year that the requirement was being developed. Hence along came agile with its smaller iterations and frequent releases.

Mission critical systems aside, most web or other software applications we build today can be easily repaired.

Big waterfall projects, like building a plane, are bound to fail. The Boeing 787 Dreamliner was an epic fail. Not only was it five delays and many years late, it had two major lithium ion battery faults in its first 52,000 hours of flying which caused months of grounding and has no doubt affected future sales, causing millions of dollars in damages. But it seems to have been well tested:

“To evaluate the effect of cell venting resulting from an internal short circuit, Boeing performed testing that involved puncturing a cell with a nail to induce an internal short circuit. This test resulted in cell venting with smoke but no fire. In addition, to assess the likelihood of occurrence of cell venting, Boeing acquired information from other companies about their experience using similar lithium-ion battery cells. On the basis of this information, Boeing assessed that the likelihood of occurrence of cell venting would be about one in 10 million flight hours.”

NTSB Interim Report DCA13IA037 pp.32-33

After months of grounding, retesting, and completely redesigning the battery system, the cause of the original battery failures are still unknown. If they can’t work out what the problem is after it has occured twice in production, it’s not likely it could have been found or resolved in initial testing.

But most of us don’t work on such mission critical systems anyway.

And production fixes can be very easy.

Take this very different example; I provide support for a production script that uploads a bunch of files to a server. There was a recent issue where a file-name had an apostrophe in it which meant this file was skipped when it should have been uploaded.

Upon finding out about the problem I immediately looked at my unit tests. Did I have a unit test with a file name with an apostrophe? No I didn’t. I wrote a quick unit test – it failed: as expected. I made a quick change to the regular expression constant that matches file names to include an apostrophe, I reran the unit test which passed. Yippee. I quickly reran all the other unit and integration tests and all passed, meaning I could confidently package and release the script. All of this was done in a few minutes.

I could have possibly prevented this happening by doing more thorough testing to begin with, but I am pretty sure that would have taken more effort than it did for me to fix the production bug, by writing a test for it and repackaging it. So for me it wasn’t an increase in cost whatsoever to find that bug ‘late’.

Unless you’re working on mission critical software, shipping some bugs into production is almost always better than shipping no software at all. If you work on very small, frequent deployments into production, the cost of fixing bugs once they have gone live will only be marginally greater than trying to find every bug before you ship. The longer your spend making sure your requirements are 100% correct and everything is 100% tested, ironically, your software is more likely to be out of date, and hence incorrect, once you finally go live.