AMA: Test Data Infrastructure

Anonymous asks…

Do you have set up (inexpensive) infrastructure to store data collected in your automated tests? We are currently using using selenium Java webdriver to automate our tests and IntelliJ as our IDE. We create data from scratch for each and every test case :(

My response…

I’m a little confused by the question and whether it’s about test data: data is that is needed by the automated tests, or test results data: insights into the results of our automated tests. So I’ll answer both šŸ˜€

Infrastructure to manage test data

Our tests run on specific test accounts and sites on production databases. Since our tests are end-to-end in fashion, we try to make our tests have as few dependencies as possible on existing data. Often an end-to-end scenario will involve creating, viewing, editing and deleting something. If we don’t do all of this by our UI we can use hooks that either use services or database jobs to clean up the data. I explained this in more detail previously.

Infrastructure to manage test results data

We use CircleCI for automated end-to-end tests. We have a number of projects that run different types of end-to-end tests from the same code repository for different purposes (canary tests, visual-diff tests, full regression tests for example).

We generate x-unit test results (from Mocha/Magellan) which CircleCI uses to provide insights into our test results such as this:

You can also drill down into slowest tests and most failed tests etc.

Since all our tests are open source you can view these build insights yourself!

We’re pretty happy with the insights we get from CircleCI at the moment so we don’t see a need to currently develop anything ourself.

AMA: product APIs for test automation

Michael Karlovich asks..

What’s your design approach for incorporating internal product APIs into test automation? I don’t mean in order to explicitly test them, but more for leveraging them to stage data and set application states.

My response…

As explained previously, in my current role at Automattic I primarily work on end-to-end automated tests for WordPress.com.Ā These tests run againstĀ live data (Production) no matter where our UI client (Calypso) is running (for example on localhost), so weĀ don’t use APIs for staging data or setting application state.

In previous roles we utilised a REST API to create dynamic data for an internally used web application which we found useful/necessary for repeatable UI tests.

We also utilised test controllers to set web application state for a public website. These test controllers were very handy as they allowed you to visit something like http://myteststore.com/testsetup/checkout which would set up an order for you with products in your session, and instantly display the checkout page, which would typically take 8 steps from the start of the process to this page.

This saved us lots of time and made our specific tests more deterministic as we could avoid the 8 or so ‘setup’ steps and use a single URL to access our page.

This approach had a couple of downsides in that this couldn’t ever be deployed to production, and it didn’t test realistic user flow which includes those ‘setup’ steps.Ā There were two things we had to do to avoid the risk of using this approach; firstly ensure that these test controllers were never deployed to production though config, and secondly we had to ensure we had some end-to-end coverage so we were at least testing some real user flows.

AMA: handling the database

Andy asks…

How are you handling the db in automation suites?

I’m running into issues where the test DB is, by necessity, a rather weighty 900mb, so a simple drop and restore from known backup is hugely time consuming.

“If you automate a mess, you get an automated mess.ā€ -Rod Michael

My response…

In my current role at Automattic I primarily work on end-to-end automated tests for WordPress.com.Ā These tests run againstĀ live data (Production) no matter where our UI client (Calypso) is running (for example on localhost), so we just make sure our config points to the data that we need (test sites) and create other test data within the e2e scenarios.

In previous organisations I have used a scaled down backup of production that had specific test data ‘seeded’ into it. Our DBAs had a bunch of scripts that would take a backup and cleanse/remove a whole heap of data (for example, archived products, orders) so thatĀ resulted in a small manageable backup that we could quickly restore into an environment. I found this to be a good approach as it gave us realistic data but it wasn’t time consuming restoring this when necessary, eg. before a CI test run.

I also shared some other data creation techniques in a previous answer.

AMA: managing test data

Cameron asks…

I’m new to test automation. I’m writing selenium/protractor tests in C# within the project solution, which allows developers to run all of my UI tests along side their own Unit tests.

The project is all very new, and big chunks aren’t built yet. I’m trying to grow my tests along with the project as each function is fleshed out.

I’m struggling with test data! The BAs have had a tool built for them which allows them to create series of test data in XML and have it all imported. This seems a bit cumbersome for my uses and I’d prefer to seed in my test data programmatically. I have figured out mostly how to use the data layer of our application to get stuff in there, but it’s very quickly getting out of hand with the amount of test data being created, it’s very hard to manage.

Should each test case seed it’s own test data as part of the test run? This would have the benefit of if requirements change, the test will fail, I can go directly to it and amend the test data to match the new requirements.

Or, should test data be separated out in a central location?

My response…

I answered a similar question to this yesterday, so it might help to read that first.

It’s great to hear you’re writing tests alongside the application code: I have found this to lead to better collaboration and increased usefulness and adoptability of automated testing.

As per that other post, I find a combination of seeding test data in a central location that is generic enough to be used across many different tests, and programmatically creating/destroying data in test hooks (via scripts or APIs) works quite well. I avoidĀ as much as possible having to manually create data as this isn’t easily repeatable.