Today’s conference began with some rather funny commentary shared by Yvette Nameth’s mother from yesterday’s talks. I was mentioned as the ‘flaky’ guy:
My main takeaway from the entire conference is that it seems we get way too caught up on complex solutions for our testing. We need to keep asking ourselves: “what’s the simplest thing that could possibly work?” If we have complex systems why do we need complex tests? We need to take each large complex problem we work on and break it down till we get something small and manageable and solve that problem. Rinse and repeat.
Flaky tests are the bugbear of any automated test engineer; as Alister says “insanity is running the same tests over and over again and getting different results”. Flaky tests cause no end of despair, but perhaps there’s no such thing as a flaky or non-flaky test, perhaps we need to look at this problem through a different lens. We should spend more time building more deterministic, more testable systems than spending time building resilient and persistent tests. Alister will share some examples of when test flakiness hid real problems underneath the system, and how it’s possible to solve test flakiness by building better systems.
I was lucky enough the attend the Google Test Automation Conference (GTAC) at Google Kirkland in Washington last week. As usual, it was a very well run conference with an interesting mix of talks and attendees.
Whilst there wasn’t an official theme this year, I personally saw two themes emerge throughout the two days: dealing with flaky tests and running automated tests on real mobile devices.
There wasn’t too many talks that didn’t mention flaky automated tests (known as ‘flakes’) at some point. Whilst there seemed to be some suggestions for dealing with flaky tests (like Facebook running new tests x times to see if they fail and classify them as flaky and assign to the owner to fix), there didn’t seem to be a lot of solutions for avoiding the creation of flaky tests in the first place which I would have liked to see.
Real Mobile Devices
The obsession of running mobile automated tests on real devices continued from last year’s conference with talks about mobile devices as a service. I personally think we’d be better spending the time and effort on making more realistic mobile emulators that we can scale rather than continuing the real device test obsession.
My key takeaway was even highly innovative companies like Google, Facebook and Netflix still struggle balancing software quality and velocity. In Australia, these companies don’t have a strong presence here, and often the IT management of smaller companies here like to say things like “Google does x” or “Facebook does y”. The problem with this is they only know these companies from the outside. Ankit Mehta’s slides at the beginning of his keynote captured this perfectly and hence were my favorite slides of the conference:
The main theme for today’s talks was Android UI automation with various approaches demonstrated.
Thomas Knych, Stefan Ramsauer and Valera Zakharov from Google gave a highly entertaining presentation about Android testing at scale. This was one of my favorite talks of the conference. They highlighted that insistence on automated testing using real devices is inefficient and problematic, and that you should first run a majority of tests on emulators which finds a majority of the bugs. This is something I have been saying for a long time and it was refreshing to hear it from a Google Android team. Ways to speed up Android emulators include using snapshots for fast restores, as well as using x86 accelerated AVDs. Interestingly, the Google Android team ran 82 million Android automated tests using emulators in March alone (there are approx 2.5 million seconds in March) with only 0.15% of tests being categorized as flaky. This is partly due to using a Google only automated testing tool for Android called Espresso. Another key takeaway was if you are using physical devices then don’t glue them to a wall or whiteboard. The devices get hot, melt the glue and get damaged as they hit the floor.
Guang Zhu (朱光) and Adam Momtaz also from Google talked about some historical approaches to Android automation (instrumental, image recognition and hierarchy viewer) and how to use features in newer Android API versions (16+) to automate tests reliably.
Jonathan Lipps from Sauce Labs demonstrated the very impressive tool Appium which enables iOS and Android automation using WebDriver bindings allowing you to use your language of choice with the promise to write once and run across the two platforms. This isn’t exactly true as the selectors will be different but these can be defined in a module so your test code is readable. Jonathan explained the philosophy behind the tool and even demonstrated a quick demo running against the new FirefoxOS to demonstrate its flexibility. Some of the limitations mentioned were you can only run one iOS emulator per physical Apple Mac which limits continuous integration scalability. It was overall a very impressive polished tool.
Eduardo Bravo from the Google+ team gave an interesting lightning talk about hands-on experience in testing Google+ apps across Android and iOS. They use KIF for iOS testing. Eduardo was quote worthy with such gems as “flaky tests are worse than no tests” and “don’t give devs a reason not to write tests“. The hermetic theme was recurrent with the ongoing endeavor to reduce flakiness by using hermetic environments with known canned responses to make tests deterministic. A very enjoyable talk.
Valera Zakharov from the Google Android dev team discussed an internal tool Espresso which makes Android tests much more efficient and reliable, and with less boilerplate code. My only complaint: don’t demo an awesome tool that isn’t open source and available for others to use.
Michael Klepikov from Google talked about using the upcoming ChromeDriver 2 server to access performance metrics from the Chrome Developer Tools. He demonstrated some fancy looking results generated by webpagetest.org. I don’t believe you need ChromeDriver 2 to do this though, the W3C navigation timing spec provides performance metrics right now.
Yvette Nameth and Brendan Dhein from the Google Maps team discussed the challenge of testing large Google Maps datasets, demonstrating a risk based approach: eg. Ensuring the Eiffel Tower is accurate is important, but the accuracy of your Gran’s farm is not.
Celal Ziftci and Vivek Ramavajjala from the University of San Diego presented their findings of work at Google to automatically find culprits in failing builds. This was a highly interesting talk about creating a tool to analyze multiple change sets in a build and work out which is most suspicious using a couple of heuristics: number of files changed and distance from root. The tool originally took 6 hours to perform an analysis but they reduced this to 2-3 minutes using extensive caching. The tool they developed allows extensible heuristics to allow additional intelligence such as keyword analysis.
Katerina Goseva-Popstojanova talked about academic analysis of software product line quality. She highlighted that open source software projects are the Promised Land for academia in that the code is fully accessible and can be used for academic analysis and research.
Claudio Criscione from Google discussed Cross Site Scripting (XSS) vulnerabilities and some automated solutions to checking for these.
During the afternoon I went for a tour of the Google New York City office here in Chelsea. All I can say is wow. The view from the 11th floor roof top balcony was very nice too (see pics below).
A very enjoyable and smooth conference and well done to all involved organizing it.
Ari Shamash from Google talked about the consistent issue of non-deterministic (flaky) automated tests and how Google use hermetic environments to highlight these tests. This involves creating 5-20 instances of an application and running tests repeatably to identify inconsistent results.
James Waldrop from Twitter discussed their ongoing strive to eliminate the fail whale through performance testing. He discussed production testing techniques: canaries (small subset of users provided new functionality), dark traffic (use existing app but send some traffic to new version and throwaway response), and tap compare which is comparing dark traffic to actual. He then talked about his tool homegrown performance tool Iago (commonly called Lago because of the capital I in sans-serif fonts).
Malini Das and David Burns from Mozilla discused automated testing of the FirefoxOS mobile operating system and how it uses WebDriver extensively to test the inner context (content) and outer context (chrome) of FirefoxOS. They have a neat Panda Board (headless devices) device pool which can cause non-determistic test failures due to hardware failure. One key point was how important volume/soak testing as people don’t turn off their phones – they expect them to run without rebooting them or turning them off.
Igor Dorovskikh and Kaustubh Gawande from Expedia discussed Expedia’s approach to test driven continuous delivery. Interestingly they use ruby for their automated integration and acceptance tests even though the programmers write their web application in Java. Having a green build light is critical to them which means a failed build rolls back automatically after 10 minutes: giving someone 10 minutes to check in a fix. To enable this, they have created a build coach role which is shared amongst the team, even project managers and directors can take on this role to keep the build green. They also stated that running mobile web app tests on real devices and emulators (using WebDriver) has been beneficial, as well as standard browser user agent emulation to get around issues with multiple windows for features like Facebook authentication.
David Röthlisberger from YouView demonstrated automated set top box testing which uses a video capture comparison tool that compares expected images – similar to Sikuli. These images are stored in a library should the application change in look and feel.
Ken Kania from Google discussed ChromeDriver 2.0 and its advanced support for mobile Chrome browsers.
Simon Stewart from Facebook talked about Android application testing at Facebook. Originally Facebook used Web Views in Android & iOS which enabled frequent deployment but resulted in a terrible user experience. They have since started developing native applications for each feature. Interestingly, every feature team has responsibility for all platforms: web, mobile web, Android and iOS. This enables feature parity across platforms. Facebook use their own build tool BUCK which enables faster builds. Simon also pointed out that engineers are entirely responsible for testing at Facebook: they have no test team, no QA department or testers employed. Some engineers are passionate about testing, like some others are passionate about Databases. Dogfooding is very common amongst engineers which results in edge cases being discovered before being released to Production. A highly entertaining talk.
Google really know how to run a conference. It’s hands-down the smoothest one I’ve attended; from the sign-in process to the schedule being adhered to. They even have stenographers and have sign language interpreters.
Oh, and NYC is great. I went to the top of the Empire State Building yesterday: the view to lower Manhattan was amazing.