Test Automation: why ‘record and replay’ fails

Everyone knows ‘record and replay’ is an immature approach to test automation that leads to fragile automated test suites. Right?

Well the reason I’ve thought this, for a very long time, is that most record and replay tools (for example, Selenium IDE) duplicate element locators throughout the scripts they create.

For example, if I record five variances of signing up for a WordPress.com site, then I will have five ‘copies’ of how Selenium has located elements on those various signup screens, which, if any of these elements change, will mean all five tests will be broken. This is test suite fragility.

Well now there’s some evidence that shows this is actually the reason why ‘record and playback’ fails.

Your web application regression tests created using record/replay tools are fragile and keep breaking. Hammoudi et al. set out to find out why. If we knew that, perhaps we could design mechanisms to automatically repair broken tests, or to build more robust tests. The authors look at 300 different versions of five open source web applications, creating test suites for their initial versions using Selenium IDE and then following the evolution of the projects.

Over 70% of all breakages are due to locator fragility in the face of change, and over 50% of all breakages are further due to attribute-based locators.

Adam Colyer – Why do record/replay tests of web applications break?

This also highlights a second reason why record and replay fails; as well as those locators being duplicated they may also not be the best locators.

For example, a record and replay tool like Selenium IDE may select elements based upon id, or xpath, even though these may change with every version of the application so using class which may be more static and therefore more appropriate. Or even better, using custom data-attributes might be the best solution of all. But a record and replay tool isn’t going to suggest using these data attributes that may not even exist (until you add them yourself).

My conclusion from reading this study is that it confirms the validity of my suspicion that record and replay tools create fragile test suites.

By using a time-proven thoughtful approach, like defining page and component models with non-duplicated and least-brittle hand-picked element locators, you will realize long term benefits of automated test suite maintainability.

Author: Alister Scott

Alister is an Excellence Wrangler for Automattic.

10 thoughts on “Test Automation: why ‘record and replay’ fails”

    1. I like recorders to have a base to work with then changing what is required like using specific IDs created for testing purposes, also to make sure that I am not skipping a step. But I totally agree record and replay only is not a good practice

      Liked by 1 person

      1. There also weaknesses to task-based methods (login, search_item, add_to_cart) because they are too limited in scope. The answer to that is either have massive login & search_item methods with select statements within, or multiple task-based methods covering the similar behaviors. In the former, you get a maintenance nightmare or parameter options, many places to fail within one method, solved by many debug statements. In the latter, you get to update object identifiers within multiple methods. Page objects simplify your life.

        Liked by 1 person

  1. Have you checked out Sahi Pro which doesn’t rely on XPath or CSS Selectors. It uses its own wrappers around the JavaScript DOM to identify elements. In cases where ids are missing, we can use relational APIs like _near which help us identify elements with respect to another known element. In cases where elements have the same name, it automatically adds an index.

    Though it has an excellent recorder, one can always build on the recorded snippets of code. One can also call Java libraries from Sahi Script.

    I would be interested of your comments on Sahi Pro (www.sahipro.com)


  2. On the other hand – maybe when concluding we are looking at it wrong –
    Maybe instead of just pointing at the limitations, we can push the tools providers / community to fix these.
    I guess it may be easy to make better reuse of locators while recording (just like we would have done manually – creating a repository),
    Add ability to seek common blocks and converts them into functions,


  3. Full Disclosure: I’m the founder of Testim.io

    Using Selenium IDE as proof that R/P tools don’t work is kinda like using Shaquille O’neal’s stats to show that human’s can’t throw free throw :)

    The fact that Selenium IDE specifically doesn’t provide you with a way to reusable logic (and locators) doesn’t mean that R/P tools doesn’t, since there are a few that does that. QA and Testim.io does that, and support passing parameters for each call to the reuse part (e.g. http://docs.testim.io/docs/add-parameters-for-groups)

    I do agree that, and talk about this a lot in the last year (SeConf 2016 & Selenium Camp) that current no R/P tools can work if they they assume the app doesn’t change. I.E. Most tools are making their decision on a single (and static) view of the application, and I show a lot of cases where such tools will always fail.

    I also agree that if there’s no automatic maintenance of those locators (in case of no reuse) the qa/dev is left with so much work (maintaining those locators) that the ROI is not worth it, explaining the close to 0 retention of those products.
    This is especially true now, in the agile world, where apps change very often.

    That’s of course was the reason I started Testim.io, which not only allows reuse of locators (and parameterizing) but also automatically maintain those locators, and update them as you run your tests.
    The more you run your tests the more robust your locators are.
    I.E. The locators change and improve from run to run.

    My talk in Selenium Camp shows we use Machine Learning to do a better job.

    We’re hoping we’re getting to a point soon that R/P will be not only as good as a dev writing them manually.. but better :)
    Hopefully, we’ll soon have a case study (done with NetApp), which tried (in parallel) wring code in Selenium (not IDE) and using Testim, and measured the ROI.

    Oren Rubin


  4. I don’t really want this to turn into a pitch for commercial record and playback tools and how they do things differently.

    I am only interested in open source tools since I work for a company that leads the development of one of the biggest open source products in the world (WordPress), and we have come across various things we’ve wanted to do with our test tools which were able to implement ourselves since we have access to the source code of our tools.

    I believe that building testable web applications is a better long-term approach than building better commercial testing tools to deal with applications that haven’t been built with testability in mind.


  5. Ya know, I wrote a big long diatribe on this. But decided in the end it comes down to just saying “It’s Automation, Not Automagic”. R/P (Record & Playback) is limited in what you can use it for. I mainly use it for prototyping, but then go and clean up the code/script to work within an already predefined framework. I also work with development to “bake in” Testability to the software under test. Key lesson, use your brain first.

    Liked by 1 person

Comments are closed.