This is a talk I delivered at CukeUp! Australia on Friday 20 November in Sydney, Australia.
Automated acceptance testing is hard. I’ve been doing it for over 10 years and it can still be a struggle! There’s at least 500 things you shouldn’t do.
There’s never a one-size-fits all solution you see. That’s why we all come to these conferences isn’t it? Because we’re still trying to work things out!
But what are the hardest parts? How can we work together to make things as simple as they can possibly be?
In my experience it’s easy to write all the specs, rules and examples. We can make them comprehensive, and we can think up different scenarios and examples. That’s the fairly easy bit.
But when we convert those straight into automated acceptance tests the fun begins! We end up with a huge number of complicated tests for large, complicated systems, which hinders velocity rather than enhances it.
The reason we have automated tests is to speed things up rather slowing us down. Automated tests, done well, can increase both quality and velocity at the same time. Nothing else can do that.
Imagine driving a container ship into work each day. Would it be fast? Would it be possible to park in the car park? Would it be economical? There’s a large chance you’d arrive 3 days late, have caused multiple accidents on the way, and have cost a fortune in fuel. Large, unwieldy tests suites are the same.
But it doesn’t have to be this way. I’ve seen it done right, as right as it can be, but doing it ‘right’ is not that black and white: it depends on your organization.
If testing was a vehicle, what would it look like? What do you need to be driving to to get to where your organization is going each day? Everyone’s will be different and suit their type of journey- we don’t all drive Ferrari’s (or need to). How do we make the mythical vehicle our organization needs? Is safety or versatility or speed the priority for your organization?
My time on high performing digital teams has showed me how automated testing can enable us to be the first to market for so many things! With organization such as Domino’s I was involved in lots of firsts- first online pizza tracker, first GPS driver tracker, first pizza chain with an android watch app, apple watch app and chromecast app. Quick releases and cutting edge technology was only possible because we were confident in our tests- we had our testing safety blanket to push on green..
We released fast and released hard. Our automated test suite which ran completely in parallel in 10-20 minutes, would have required 184 QA staff to execute manually in the same amount of time. There was too much at stake in such a successful company to not have confidence in every release: we had three testers at Domino’s and couldn’t have the required level of confidence without having automated tests.
I’d like to share some of these insights, what our safety blanket was made of, this afternoon.
When most people think about automated acceptance tests, they usually just think about any tests that represent user functionality or acceptance of a system, and they put them in one big lump; they don’t differentiate these.
It’s easy to differentiate unit tests with integration tests, not many confuse the two, and most people agree about having a healthy test pyramid.
We need to separate acceptance and end-to-end tests because they do different things, need to be run separately and we need to get the proportions of each right.
There’s some key differences between automated acceptance tests and automated end-to-end tests.
An automated acceptance test should be targeted and test one thing. It should test that thing in the most efficient way possible, whether that’s using a testability feature in your app that allows directly accessing your functionality, or an API or end-point.
An automated end-to-end test should cover a realistic user journey through your app, and use the app the same way as a user does. You need to severely limit how many of these you create, as you only need a handful for a med-large app. There’s surprisingly very few common flows in most apps: we develop a lot of functionality our users don’t use!
These end to end tests take the most time to run as they’re “full stack” and typically are the most fragile since they contain so many steps. But they do provide a lot of value as you’ll be confident those critical user journeys in your app are working at any point in time.
An acceptance test for an online tea store may focus around specific shipping policies (eg. free domestic shipping), or specific payment methods offered (eg. paypal, credit cards or POLi)
An end to end test would cover a customer arriving at your site, ordering some tea and paying for it via a popular payment method, all in the one test.
It’s important to differentiate these as we want to limit our end-to-end tests and focus on making our acceptance tests as fast and as targeted as possible.
You heard the terms ‘imperative’ and ‘declarative’ yesterday. I tend to avoid these as I find them confusing (Cucumber is imperative at heart) and use ‘intention’ and ‘implementation’ instead.
When you’re writing scenarios it’s easy to put implementation detail in them, but resist the temptation, as this doesn’t lead to business readability and longevity of your specifications.
I’ve seen some horrendous cucumber scenarios. As a general rule try to avoid putting specific UI design elements of your app, or technical implementation details into your scenarios.
For example, having steps that involve clicking different values means they are directly tied to the UI, and any slight changes will quickly break your tests. Refactoring plain English scenarios is hard: it’s much easier to refactor the code underneath, for example, using a page object model allows you to model a page and any changes to it are done in code in a single place.
If you put xpath selectors to locate elements directly in your cucumber scenarios, don’t laugh I have seen it done, stop it. Stop it now.
Keeping your features and scenarios about ‘intention’ (what the system should do) rather than “implementation” (how the system does it) makes them so much clearer and easier to understand, plus you can change your test’s implementation (to use an API for example), without any changes to your specifications. An intention focus is less time consuming to develop and has more chance of staying relevant- win!
Every test is an island. It’s important that every automated test you write is as independent as possible. Dependencies between tests make them hard to write/debug and impossible to run in parallel.
They are also much harder to run at all as you’ll often need complicated solutions around test ordering and storing test state and data between tests and test runs.
Aim to write individual, independent tests, and use testability features to set up any pre-conditions so that you can test what you need to test. In my example here I use a precondition (given) statement that uses an application testability hook to retrieve a recent order and display it. If you had to create an order (using a different test) in order to run this, it makes it much more complicated and harder to maintain. Keep them simple and independent. 😀
A lot of people obsess about code coverage and metrics but I think you should obsess about outcomes of your automated testing effort instead. You need to ask yourself why you’re doing automated testing in the first place, and measure if what you’re doing allows you to do that.
It shouldn’t be because it’s cool (because it is), it shouldn’t be because someone else is doing it (because they are), but it should be because you’re trying to do something for your business, like increase your time to market, or increase the quality of your user experience, and you want confidence every time you release.
If you understand this, then you’ll understand that there’s no point in having 100% unit test coverage if you have showstopper bugs every time you release, or it still takes a team of ten testers three weeks to manually regression test your application.
I personally love the aim of zero scripted manual regression testing with zero showstoppers after every release, with as few as automated tests as possible that allows you to reach this outcome. Automated tests are time consuming and costly: why have more than you need to reach your desired outcome.
Once you work this out, it can guide your process: you can implement automated testing for a valid business reason and make sure it is meeting its objective.
Did you find a showstopper after a release? Well just write a new automated test to ensure it doesn’t happen again.
Are you having to manually regression test a new feature? Add new automated tests until you feel confident you don’t need any manual regression testing any more.
There’s nothing good about manual scripted regression testing.
But manual testing is amazing!
Automated testing is great for regression testing of existing functionality. But it sucks for story testing, it doesn’t explore your system.
Automated regression tests enable you to spend more time on real testing. Better manual testing, better story testing, better exploratory testing.
Testing a story, exploring your system, spot checking different devices and the ways you can use them, trying different user flows, trying different languages and cultures, looking for weird quirks, all this great stuff that we humans excel at, and machines don’t.
In your automated testing fury: don’t forget good old human manual testing. Our users are human after all (at least for now).
In all my time, I’ve never seen highly effective automated tests developed in isolation from the application development team.
If everyone on the team is responsible for quality, then you can make them work. This can be hard sometimes as it’s often culturally ingrained that software development isn’t responsible for quality: but it’s definitely doable and builds momentum once a culture of quality is seeded. It just takes one or two developers to see the light, see the massive benefits that automated testing brings. I like the concept of a quality advocate as a role on a cross-functional team: someone who doesn’t assure quality but simply advocates for it.
A build light, with entertaining/humorous consequences for build breakers, connected to your continuous integration pipeline in a highly prominent point is a great place to start to ensure team ownership.
Your chances of success are magnificently increased if you get everyone on-board and working together.
Unless you’re running your tests frequently, they’ll get out of date quickly. Run them every time someone checks in code. If you’ve made an awesome new vehicle and you keep adding enhancements- drive it after you add new things…the engine will keep running smoothly…and you can check if it still drives!
Don’t wait until you’ve added a whole bunch of new things to test if it works.
All tests should belong in the same source control repository as your app as this means they won’t get out of date and there’s no excuse not to work on them. Put it all under the bonnet: this makes it obvious.
By running your tests frequently you’ll keep them up to date with the code-base and be more likely to succeed.
As you build up more and more automated tests, they’ll take longer to run. You’ll need to start running them in parallel to speed up the total run time. You can quickly spin up build agents to run tests and collate results on a continuous integration testing server.
It’s best to start doing this from the start, as you’ll need to design your tests to be independent from each other, and not share any data or system state, otherwise you’ll experience conflicts between tests running at the same time.
Running tests in parallel not only is quicker, but it’s more representative of real world use: you wouldn’t have a single user using your app in real life: unless it’s MySpace.
At Domino’s Digital we started running all our automated tests against 3 or 4 different browsers for every run. It quickly became a nightmare!
Different browsers behave slightly differently and we spent a lot of effort trying to solve slight inconsistencies. And they didn’t even find browser specific bugs.
Most browser specific bugs are about layouts or aesthetics, and won’t be found by functional automated tests anyway.
Pick the browser most used by your customers (most likely Chrome) and automate against that.
Reliable automated tests against a single browser trump flaky automated tests against every browser, 86% of the time, every time.
Automated acceptance tests cost a lot of money, so you need to get a lot of value back.
Look for any opportunity to extract extra value for little outlay.
What if you could run your same tests against every language your app supports and capture screenshots for the i18n team? Little effort -> max value.
What if you ran your automated end-to-end tests against production after every deployment so you have almost instant confidence your deployment was successful? Little effort -> max value.
At Domino’s we ran all our end to end tests after every staging and production deployment. We needed to make sure they had certain identifiers to automatically cancel the orders created, but this was trivial compared to the benefits they yielded. We were very confident a deployment happened with a green acceptance run.
Always be seeking value from your tests. Run them in dev, test, staging and production.
The title of my talks said I had 500 don’ts but I actually don’t have that many.
Don’t stop trying things. Try the craziest combinations of things you can imagine. You can be pretty sure your users will try it too.
Everything is about experimentation.
Just because I said to do something here this afternoon is no guarantee it’ll work for you: these are my experiences which I hope you can learn from or adapt an idea for you to build your own safety blanket for your testing journey.
You’ll work it out- look at it from angles only humans can.
Just don’t stop being awesome!