One of the themes I talked about during my presentation in Wellington was the blurry line between test and development in a distributed environment like Automattic.
I was recently having trouble with a complex method in our WordPress.com e2e test page objects, so I used my skills as a developer and wrote a change to our user interface which adds a data attribute to the HTML element.
This meant our page object method immediately went from this:
Do you actively close bugs because they reach a certain age?
One of the (many) things I love about Automattic is the attention that is given to bug triage. Bug triage is the habit of continually grooming our bug lists to ensure they are constantly relevant, updated and reflective of the current state of our products. A benefit of this is that an up-to-date and prioritized bug list translates directly into a backlog of maintenance work items for a product development team.
I recently saw this quote in an article by Nikita Hasis on Medium.
“If Your Test Leaders Aren’t Telling You To Write Code, They Are Lying!
Even if it’s by omission.
There’s this argument, almost daily, about whether software testers should learn programming. I’ll jump right in. It is unimaginable that someone would tell you NOT to learn something. That’s the first, and probably shittiest lie that inexperienced testers get fed. It’s further unimaginable, and downright irresponsible to tell people not to learn something that is very clearly where a large, well-paying, and above all interesting part of the industry is heading. Wanna work on innovative, data-driven projects with smart and driven people? You probably need to pull up terminal and at least get your toes wet, y’all.
The worst part of the lie is that it imposes that coding is a difficult grind and will only cause more problems than it solves. I even saw Alister Scott’s blog post referenced as an argument against coding, ironic as it is.”
I have been testing web apps for over ten years, and making cross-browser testing “suck less” has been and still is a top goal of mine. I recognize that visual presentation/layout must be reviewed by human eyes, but given the growing number of OS/device/browser combinations we need to support/test, I feel like I’m missing an opportunity to streamline things every time I spin up a dozen VMs to check a new page.
Here’s what I do currently, using an online tool that provides access to various OS/device/browser combinations
1. I spin up a VM for an OS/device/browser combo I’m checking and check the page
2. Repeat step 1 for each combo I need to check
I have done a little bit of research on the tool’s APIs and I think I could at least automate the process of spinning up each combination I need.
I have also tried tools that purport to be able to play back your recorded Selenium IDE steps in whichever configurations you choose, but it didn’t work very well even if I took the time to update the recorded steps to use reliable locators.
Also, while we do have automated smoke and regression suites using Selenium, I have not been exposed to or thought of an automated approach to checking page layout that doesn’t immediately seem like it would be awful to maintain (other than perhaps just recording screencasts while interacting with each page and having a human review them).
So: How do you approach cross-browser testing for new feature development and for regression purposes?
Thanks so much for your AMA and I hope you pick my question!
I’ll split the response into two parts: what I recommend for cross-browser regression testing, and what I recommend for cross-browser new feature testing.
Cross-browser Testing for Regression Purposes
I am still on the opinion that there’s little-to-no return on investment (ROI) in running automated functional regression tests across different browsers. My approach is to typically understand what your most used customer browser is (most likely Chrome) and automate your e2e regression tests against that. I’m still of the opinion, even though tools like Selenium-WebDriver have multi browser support, that maintaining a suite of e2e tests that work consistently across multiple browsers is an onerous task. The one variant that that I do like to automatically test is different screen resolutions, as fully responsive web applications can functionally behave differently at different screen widths in the same browser. At WordPress.com, for example, we run our e2e tests against three screen sizes in Chrome (mobile, tablet, and desktop).
We also run automated visual comparison tests to ensure we don’t introduce unexpected variances in our interface design/appearance. These run in the same three sizes in a single browser (which happens to be Firefox for technical reasons). They have some dynamic content capability so if the layout of the page looks okay, but the content is slightly different, then they still pass. There still is an additional overhead in maintaining these in addition to our functional tests though.
Whilst automated e2e tests are great to cover key scenarios for regression purposes, I have found it also very useful to supplement this with continuous exploratory testing of existing functionality in real world use (dogfooding) in different browsers, different operating systems and on different devices. This picks up real human issues that our automated e2e and visual comparison tests don’t find.
We are huge believers in continuous dogfooding at WordPress.com to the extent that we recently built a Slack ‘testbot’ that suggests both a real user flow and a browser/OS to test that on for when you feel like testing something. For example:
alisterscott: I am looking for something to test
testbot: @alisterscott: Try creating a new post making sure you add some media in IE10
Cross-browser Testing for New Feature Testing
I don’t believe you can test all new features on all browsers (unless you have a really big team maybe). So you can either take a risk based approach (test the most used browsers first), or you can just mix it up and test different features in different browsers.
Sometimes there may be exceptions, I recently tested a upgraded version of our WYSIWYG editor and I wanted to be sure that this worked on various browsers – even upcoming ones which is what the new editor was adding support for.
Our WordPress.com admin interface Calypso only supports IE10 and Edge, so if I want to test in either of those, I use a freely, legally available Microsoft VM running in VirtualBox on OSX to test this. These VMs work really well.
To summarise, cross-browser testing still sucks, but it’s still a thing we need to do, especially when we have diverse groups of users with different devices and browsers. There is a trend towards browser vendors fully embracing/adopting open browser/web standards so hopefully browser specific bugs, or quirks, will soon become a thing of the past. For example Microsoft Edge is a much nicer browser to develop for and test than previous Internet Explorer versions. One can only hope and pray.
hey, i like your blog, i’m reading your blog on a daily basis, and i just want to ask you what i can i do to enhance my skills, knowledge and to be a be good in functional testing (manual and automation) IF i’m the ONLY tester in my current company (performing all testing activities), but i feel that i have a mess in my head and lots to learn to be up to date with the last trends.
my question now “What to learn, When(everyday?) and How in case your are the only tester in your company”.
i hope to answer my question soon to make my head calm down
I have fond memories of a project where I worked as the solo tester on a software delivery team; we had something like 8 developers (including one lead who took on iteration management tasks) and me as a tester. That was it. I loved it because we established really good unhindered rapport with our business stakeholders, and I was always busy!
But getting back to your question: when I was working in that situation it was vital that we had developers working on as much test automation as possible, alongside functional development, since there was just too much functional testing to do for a single person also doing test automation. I found myself spending about 80% of my time just testing new functionality and spot-testing different browsers/devices etc. The remaining 20% I spent ensuring we had good regression test coverage through the automated tests that developers were writing and I was helping maintain. So first and foremost I would strongly encourage you to get the developers you work with to have as much responsibility as possible for automated tests.
If you have this in place you should be able to take a deep breath and spend more time doing quality testing. I have found the best way to learn is on the job, you’ll be in a good position to do this, and often you’ll learn best by making your own mistakes.
If you want to know more about how to become better at what you do, I’ve shared some tips in a previous answer. There’s also my Pride & Paradev book which I wrote whilst working as a solo tester on a team.
Imagine if someone came to you and told you that your website was causing their laptop to throw a fatal system error, the dreaded ‘blue screen of death‘, what would your response be?
Well I know what my response would be because it happened to me. My response was “no way! That can’t happen! A website can’t make a computer BSOD!” I would have bet $1000 on that. Turns out I was wrong; very wrong.
I was working for a very popular pizza delivery chain and one day our development team began receiving reports of customer’s complaints (mostly via social media) that our site was causing their computers to throw blue screen of death errors! We laughed about it, yeah right, that can’t happen! Crash a browser tab maybe, but not an operating system. But we tried to reproduce it anyway on the large number of laptops we had and no matter how hard we tried we couldn’t reproduce it.
A few days later a member of our customer support team appeared saying our site just BSOD’d his laptop! We were curious, very curious. So we started the laptop back up and visited our site and voila! BSOD!
Now here’s where you might not believe me, so I took a video of it as proof, for your enjoyment:
Now that we had a single laptop that consistently reproduced the BSOD we could work out why it was happening.
It was only happening in Chrome, and only on this single laptop which ran Windows 8. We built/ran our site on a developer’s machine and accessed it via this laptop and could reproduce the crash every time.
We noted that the version of Chrome was one version behind the latest version on every other laptop we had – the update had somehow stopped and was stuck at that version.
But we didn’t update Chrome as that would have destroyed our single machine that was reproducing our issue! (It is pretty much impossible to get find older versions of Chrome).
After some time, it was a lengthy process as each test would result in a reboot, when we removed a resource reference to a font our site used, which was actually a Google Font on their CDN, it suddenly stopped BSODing. We added it back and voila; it crashed.
A reference to a Google Font on our site was giving our customers Blue Screens of Death.
After some research we discovered it was a Chromium bug which affected all versions of Windows (only), as Chromium/Chrome were working on native font rendering for Windows. Google were very quick to patch this issue, however, if someone was stuck on an older version then it would still be an issue: there wasn’t anything we could do about it but inform our customers to ensure they are on the latest version of Chrome.
I learned a few lessons during that day:
bugs can happen anywhere and cause damage that you can’t imagine;
bugs aren’t always in your control: we didn’t write bad software to crash our customer’s machines – this wasn’t tied to a particular release that we did. You can’t just test changes to your site and expect it to be okay;
you can’t find every bug: to find this bug we would have had to constantly check our upcoming and production site against every upcoming version of Chrome on every operating system. Chrome isn’t like IE with releases every few years, you’d almost need a full time tester just to perform this role; and
For something different, let me start with a quote:
“We shall not cease from exploration, and the end of all our exploring will be to arrive where we started and know the place for the first time.”
~ T. S. Eliot
As a father of three children, I believe humans are innate explorers. So exploring a system should come natural to most people, but I’ve found a lot of people can explore a system and not find any bugs.
Techniques like session-based testing attempt to introduce measurement and control to exploratory testing so people are meant to be more effective, but, like the gorilla basketball video has shown us, introducing a goal for a session can blind us to the things that aren’t specifically in that goal. Much like following a script can blind us to things that aren’t in that script.
So how do we teach people to find bugs by exploration?
I believe the biggest thing that stops people finding bugs by exploration is wilful blindness: choosing not to know. The way you can teach someone to be a better exploratory tester therefore is by teaching them to be less blind.
Margaret Heffernan explains this superbly well in her completely non-testing non-technical book that I think every tester should read:
“We make ourselves powerless when we choose not to know. But we give ourselves hope when we insist on looking. The very fact that wilful blindness is willed, that it is a product of a rich mix of experience, knowledge, thinking, neutrons and neuroses, is what gives us the capacity to change it. Like Lear, we can learn to see better, not just because our brain changes but because we do. As all wisdom does, seeing starts with simple questions: what could I know, should I know, that I don’t know? Just what am I missing here?”
Sometimes the simplest solution is the best solution.
I like the idea of attaching screen recordings to bug reports to demonstrate an issue as they provide more context than just a single static screenshot or a series of screenshots.
In the past I have used QuickTime on Mac OSX to record a video file and attach that, but attaching/hosting/streaming video files is quite complicated for bug reports. At Automattic I have noticed people attaching animated GIFs instead: why didn’t I think of that?!?
There’s a really cool, yet oddly named, open source tool to capture screen recordings as animated GIFs: LICEcap. It works on Mac and Windows and allows you to easily choose a section of your screen, record it and save it as a GIF.
Here’s a quick recording of how to use LICEcap I just made (LICEcap recording LICEcap – how meta):
The best part is it’s so easy to attach the GIF to a Github issue, or a blog post, or in a Slack chat, and it’s instantly viewable by pretty much anyone.
Everywhere I turn I hear people talking about microservice architectures: it definitely feels like the latest, over-hyped, fad in software development. According to Martin Fowler:
“…the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.”
But what does this mean for software testing? And how does it work in the real world?
Well, my small team is responsible for maintaining/supporting a system that was developed from scratch using a microservices architecture. I must highlight I wasn’t involved in the initial development of system but I am responsible for maintaining/expanding/keeping the system running.
The system consists of 30-40 REST microservices each with it’s own code-base, git repository, database schema and deployment mechanism. A single page web application (build in AngularJS) provides a user interface to these microservices.
Whilst there are already many microservices evangelists on board the monolith hate-train; my personal experience with this architectural style has less than pleasant for a number of reasons:
There is a much, much greater overhead (efficiency tax) involved in automating the integration, versioning and dependency management of so many moving parts.
Since each microservice has its own codebase, each microservice needs appropriate infrastructure to automatically build, version, test, deploy, run and monitor it.
Whilst its easy to write tests that test a particular microservice, these individual tests don’t find problems between the services or from a user experience point of view, particularly as they will often use fake service endpoints.
Microservices are meant to be fault tolerant as they are essentially distributed systems that are naturally erratic however since they are micro, there’s lots of them which means the overhead of testing various combinations of volatility of each microservice is too high (n factorial)
Monolithic applications, especially written in strongly typed/static programming languages, generally have a higher level of application/database integrity at compile time. Since microservices are independent units, this integrity can’t be verified until run time. This means more testing in later development/test environments, which I am not keen on.
Since a lot of problems can’t be found in testing, microservices put a huge amount of emphasis on monitoring over testing. I’d personally much rather have confidence in testing something rather than relying on constant monitoring/fixing in production. Firefighting in production by development teams isn’t sustainable and leads to impacted efficiency on future enhancements.
I can understand some of the reasoning behind breaking applications down into smaller, manageable chunks but I personally believe that microservices, like any evangelist driven approach, has taken this way too far.
I’ll finish by giving a real world metric that shows just how much overhead and maintenance is involved in maintaining our microservices architected system.
A change that would typically take us 2 hours to patch/test/deploy on our ‘monolithic’ strongly typed/static programming language system typically takes 2 days to patch/test/deploy on our microservices built system. And even then I am much less confident that the change will actually work when it gets to production.
Don’t believe the hype.
Addendum: Martin Fowler seems to have had a change of heart in his recently published ‘Microservice Premium’ article about when to use microservices:
“…my primary guideline would be don’t even consider microservices unless you have a system that’s too complex to manage as a monolith. The majority of software systems should be built as a single monolithic application. Do pay attention to good modularity within that monolith, but don’t try to separate it into separate services.”
Does your organization conduct extensive post-release testing in production environments?
If you do, then it shows you probably have an unhealthy testing process, and you’ve fallen into the “let’s just test it in production” trap.
If testing in non-production environments was reflective of production behaviour, there would be no need to do production testing at all. But often testing isn’t reflective of real production behaviour, so we test in production to mitigate the risk of things going wrong.
It’s also the case that often issues are found in a QA environment don’t appear in a local development environment.
But it makes much more sense to test in an environment as close to where the code was written as possible: it’s much cheaper, easier and more efficient to find and fix bugs early.
For example, say you were testing a feature and how it behaves across numerous times of day across numerous time zones. As you progress through different test environments this becomes increasingly difficult to test:
In a local development environment: you could fake the time and timezone to see how your application behaves. In a CI or QA environment: you could change a single server time and restart your application to see how your application behaves under various time scenarios: not as easy as ‘faking’ the time locally but still fairly easy to do. In a pre-production environment: you’ll probably have clustered web servers so you’ll be looking at changing something like 6 or 8 server times to test this feature. Plus it will effect anyone else utilizing this system. In a production environment: you’ll need to wait until the actual time to test the feature as you won’t be able to change the server times in production.
Clearly it’s cheaper, easier and more efficient to test changing times in an environment closer to where the code was written.
You should aim to conduct as much testing as you can in earlier test environments and taper this off so by the time you can a change into production you’ll be confident that it’s been tested comprehensively. This probably requires some change to your testing process though.
How to Remedy A ‘Test in Production’ Culture
As soon as you find an issue in a later environment, ask why wasn’t this found in an earlier environment? Ultimately ask: why can’t we reproduce this in a local environment?
Some Hypothetical Examples
Example Two: our tests failed in pre-production that didn’t fail in QA because pre-production has a regular back up of the production database whereas QA often gets very out of date. We schedule a task to periodically restore the QA database from a production snapshot to ensure the data is reflective.
Example Three: our tests failed in production that didn’t fail in pre-production as email wasn’t being sent in production and we couldn’t test it in pre-production/QA as we didn’t want to accidentally send real emails. We configure our QA environments to send emails, but only to a white-list of specified email addresses we use for testing to stop accidental emails. We can be confident that changes to emails are tested in QA.
It’s easy to fall into a trap of just testing things in production even though it’s much more difficult and risky: things often go wrong with real data, the consequences are more severe and it’s generally more difficult to comprehensively test in production as you can’t change or fake things as easily.
Instead of just accepting “we’ll test it in production”, try instead to ask, “how can we test this much earlier whilst being confident our changes are reflective of actual behaviour?”
You’ll be much less stressed, your testing will be much more efficient and effective, and you’ll have a healthier testing process.
At the Brisbane Software Testers Meetup last week there was a group discussion about the requirement to test beyond requirements/acceptance criteria and if you’re doing so, how much is enough? Where do you draw the line? It came from an attendee who had a manager pull him up for a production bug that wasn’t found in testing but wasn’t in the requirements. If it wasn’t in the requirements, how could he test it?
In my opinion, testing purely against requirements or acceptance criteria is never enough. Here’s why.
Imagine you have a set of perfectly formed requirements/acceptance criteria, we’ll represent as this blue blob.
Then you have a perfectly formed software system your team has built represented by this yellow blob
In a perfect, yet non-existent, world, all the requirements/acceptance criteria are covered perfectly by the system, and the system exists of only the requirements/acceptance criteria.
But in the real world there’s never a perfect overlap. There’s requirements/acceptance criteria that are either missed by the system (part A), or met by the system (part B). These can both be easily verified by requirements or acceptance criteria based testing. But most importantly, there are things in your system that are not specified by any requirements or acceptance criteria (part C).
These things in part C often exist of requirements that have been made up (assumptions), as well as implicit and unknown requirements.
The biggest flaw about testing against requirements is that you won’t discover these things in part C as they’re not requirements! But, as shown by the example from the tester meetup, even though something may not be specified as a requirement, the business can think they’re a requirement when it effects usage.
Software development should aim to have as few assumptions, implicit and unknown requirements in a system as reasonably possible. Different businesses, systems and software have different tolerances for how much effort is spent on reducing the size of these unknowns, so there’s no one size fits all answer to how much is enough.
But there are two activities that a tester can perform and champion on a team which can drastically reduce the size of these unknown unknowns.
1 – User Story Kick-Offs: I have only worked on agile software development teams over the last number of years so all functionality that I test is developed in the form of a user story. I have found the best way to reduce the number of unknown requirements in a system is to make sure every user story is kicked-off with a BA, tester and developer (often called The Three Amigos) all present and each acceptance criterion is read aloud and understood by the three. At this point, as a tester, I like to raise items that haven’t been thought of so that these can be specified as acceptance criteria and are unlikely to either make it or not make it into the system by other means or assumptions.
2 – Exploratory Testing: As a tester on an agile team I make time to not only test the acceptance criteria and specific user stories, but to explore the system and understand how the stories fit together and to think of scenarios above and beyond what has been specified. Whilst user stories are good at capturing vertical slices of functionality, their weakness, in my opinion, is they are just a ‘slice’ of functionality and often cross-story requirements may be missed or implied. This is where exploratory testing is great for testing these assumptions and raising any issues that may arise across the system.
I don’t believe there’s a clear answer to how much testing above and beyond requirements/acceptance criteria is enough. There will always be things in a system that weren’t in the requirements and as a team we should strive to reduce the things that fall into that category as much as possible given the resources and time available. It isn’t just the testers role to either just test requirements or be solely responsible/accountable for requirements that aren’t specified, the team should own this risk.
I test in production way too much for my liking (more details in an upcoming blog post).
Testing in production is risky, especially because I test in a lot of different environments and they all look the same. I found the only way I could tell which environment I was in was by looking closely at the URL. This was problematic as it led to doing things in a production environment thinking I was using a pre-production or test environment – oops.
I initially thought about putting some environment specific code/CSS into our apps that made the background colour different for each environment, but the solution was complex and it still couldn’t tell me I was using production from a glance.
I recently found the Stylebot extension for Chrome that allows you to locally tweak styles on any websites you visit. I loaded this extension and added our production sites with the background colour set to bright red, so now I immediately know I am using production as it’s bright red, be extra careful.
I’ve also set some other environments to be contrasting bright colours (purple, yellow etc.) so I am know from a quick glance what environment I am using.
I like this solution as I haven’t had to change any of our apps at all and it works in all environments: which is just what I needed.
Do you do something similar? Leave a comment below.
Software testers shouldn’t write code. There I’ve said it.
“If you put too much emphasis on those [automated test] scripts, you won’t notice misaligned text, hostile user interfaces, bad color choices, and inconsistency. Worse, you’ll have a culture of testers frantically working to get their own code working, which crowds out what you need them to do: evaluate someone else’s code.”
I used to think that you could/should teach testers to write code (as it will make them better testers), but I’m now at a point where I think that it’s a bad idea to teach testers to code for a number of reasons:
A software tester’s primary responsibility/focus should always be to test software. By including a responsibility to also write code/software takes away from that primary focus. Testers will get into a trap of sorting out their own coding issues over doing their actual job.
If a software tester wants their primary focus to be writing code, they should become a software programmer. A lot of testers want to learn coding not because they’ll be a better tester, but they want to earn more money. These testers should aim to be become programmers/developers if they want to code or think they can earn more money doing that.
Developing automated tests should be done as part of developing the new/changed functionality (not separately). This has numerous benefits such as choosing the best level to test at (unit, integration etc.) at the right time. This means there isn’t a separate team lagging behind the development team for test coverage.
Testers are great at providing input into automated test coverage but shouldn’t be responsible for creating that coverage. A tester working with a developer to create tests is a good way to get this done.
I think the software development industry would be a lot better if we had expectations on programmers to be responsible for self-tested code using automated tests, and testers to be responsible for testing the software and testing the the automated tests. Any tester wanting to code will move towards a programming job that allows them to do that and not try to change what is expected of them in their role.
Update 19th Jan 2015: this post seems to have triggered a lot of emotion, let me clarify some things:
A tester having technical skills isn’t bad: the more technical skills the tester has the better – if they can interrogate a database or run a sql trace then they’ll be more efficient/effective at their job – and a tester can be technical without knowing how to code
I don’t consider moving from testing into programming by any means the only form of career advancement: some testers hate coding and that’s fine, other’s love coding and I think it would be beneficial for them to become a programmer if they want to code more than they test.
I still believe everyone should take responsibility for their own career rather than expecting their employer/boss/industry leader/blogger to do it for them (more about this here).
The developer:tester ratio question comes up a lot and I find most, if not all, answers are “it depends”.
I won’t say “it depends” (it’s annoying). I will tell you what works for me given my extensive experience, but will provide some caveats.
I’ve worked on different agile software development teams as a tester for a number of years and I personally find a ratio of 8:1 developers to tester(s) (me) works well (that’s 4 dev-pairs if pair programming). Any less developers and I am bored; any more and I have too must to test and cycle time is in jeopardy.
I’m an efficient tester and the 8:1 ratio works well when there’s 8 equally efficient programmers on the team – if the devs are too slow, or the user stories are too big, I get bored;
Everyone in the team is responsible for quality; I have to make sure that happens;
A story must be kicked off with the tester (me) present so I can question any assumptions/anomalies in the acceptance criteria before any code is written;
A story is only ready for test if the developer has demonstrated the functionality to me at their workstation (bonus points in an integrated environment) – we call this a ‘shoulder check’ – much the same way as monkeys check each others shoulders for lice;
A story is also only ready for test if the developer has created sufficient and passing automated test coverage including unit tests, integration tests (if appropriate) and some acceptance tests; and
Bug fixes take priority over new development to ensure flow.
Deciding to have lots of (automated) checks [sic: tests] is like deciding to have lots of children. It’s fun at first, but later…
I read it a number of times and each time I read it I disagreed with it a little more.
As a proud father of three beautiful boys, I truly believe having lots of children is fun at first AND fun later on. Sure, having lots of kids is hardest thing you’ll ever do and continues to be hard as each day goes by, but hard and fun aren’t opposites or mutually exclusive whatsoever1; I’ve actually found them to be strongly correlated (think of your funnest job: was it easy?). So don’t let anybody put you off having lots of kids ever, because they are still loads of fun later on (assuming you’re not scared of hard work). I love my boys: they’re the funnest people I know and they get funner every day.
As a developer of software, I also believe having lots of automated tests is fun later on, on the proviso that you’ve put thought into them upfront. I truly believe the only way to make sustainable software that you can change and refactor with confidence is to develop it using self-testing code. Sure, having too many automated e2e tests can be a PITA2 but I’d choose lots of automated tests over no or very few automated tests any day of the week3. Again, don’t let someone put you off having lots of automated tests: just do them right!
I asked James Bach on Twitter about his quote (and how many children he has, the answer is one), and in the typical self-righteous context driven testing ‘community’ style I was called ‘reckless’ for choosing to have three beautiful boys with my lovely wife.
@alisterscott How does someone so reckless as to have multiple children lecture a more parsimonious father about economics?
It didn’t end there with other members of the ‘community’ doing what they do4 and taking the opportunity to jump in uninvited, attack me for even wondering how someone with only one child can comment on having lots of children, and try to intimidate me by accusing me of using ‘ad-hominem’ falacies/attacks against James Bach (they like big words).
This entire episode reaffirms my choice to have nothing whatsoever to do with the context driven testing ‘community’ and anyone who associates themselves with it (which started by me deleting my twitter account so they can’t attack me or have anything to do with me).
My final word of warning to those of you who still consider yourself part of that ‘community’, a comment about ‘context-driven testing’:
“I chose not to engage in those dogmatic discussions. I once had a job interview where the term context-driven led one of the devs to do some googling. I had to defend myself for affiliating as he’d found some right contentious and dogmatic stuff and wondered if I were some kind of extremist for including that term in my resume. It’s no longer in my resume, FWIW.”
 I recently read that happiness and unhappiness aren’t actually the opposite of one another: you can be both happy and unhappy at the same time.
 In case you didn’t know: PITA means ‘pain in the ass’, and lots of end to end tests are a pain in the ass. There’s lots of articles on here about why, the most recent one being about Salesforce.com and its 100,000 e2e tests.
 FWIW most codebases I have worked on have had zero to little automated tests, so I don’t think having too many automated tests is our common industry problem.
 It’s not hard to find examples of where members of this ‘community’ rally against and intimidate a particular person they disagree with on twitter, for examples: here, here, here, here, here, etc. I personally know a fellow tester who had a very similar negative experience to me a couple of years ago and has since distanced herself also.
This story begins with a promo email I received from Sauce Labs…
“Ever wondered how an Enterprise company like Salesforce runs their QA tests? Learn about Salesforce’s inventory of 100,000 Selenium tests, how they run them at scale, and how to architect your test harness for success”
100,000 end-to-end selenium tests and success in the same sentence? WTF? Sounds like a nightmare to me!
I dug further and got burnt by the molten lava: the slides confirmed my nightmare was indeed real:
“We test end to end on almost every action.”
Ouch! (and yes, that is an uncredited image from my blog used in the completely wrong context)
But it gets worse. Salesforce have 7500 unique end-to-end WebDriver tests which are run on 10 browsers (IE6, IE7, IE8, IE9, IE10, IE11, Chrome, Firefox, Safari & PhantomJS) on 50,000 client VMs that cost multiple millions of dollars, totaling 1 million browser tests executed per day (which equals 20 selenium tests per day, per machine, or over 1 hour to execute each test).
My head explodes! (and yes, another uncredited image from this blog used out of context and with my title removed).
But surely that’s only one place right? Not everyone does this?
A few weeks later I watched David Heinemeier Hansson say this:
“We recently had a really bad bug in Basecamp where we actually lost some data for real customers and it was incredibly well tested at the unit level, and all the tests passed, and we still lost data. How the f*#% did this happen? It happened because we were so focused on driving our design from the unit test level we didn’t have any system tests for this particular thing.
…And after that, we sort of thought, wait a minute, all these unit tests are just focusing on these core objects in the system, these individual unit pieces, it doesn’t say anything about whether the whole system works.”
~ David Heinemeier Hansson – Ruby on Rails creator
“…layered on top is currently a set of controller tests, but I’d much rather replace those with even higher level system tests through Capybara or similar. I think that’s the direction we’re heading. Less emphasis on unit tests, because we’re no longer doing test-first as a design practice, and more emphasis on, yes, slow, system tests (Which btw do not need to be so slow any more, thanks to advances in parallelization and cloud runner infrastructure).”
~ David Heinemeier Hansson – Ruby on Rails creator
I started to get very worried. David is the creator of Ruby on Rails and very well respected within the ruby community (despite being known to be very provocative and anti-intellectual: the ‘Fox News’ of the ruby world).
But here is dhh telling us to replace lower level tests with higher level ‘system’ (end to end) tests that use something like Capybara to drive a browser because unit tests didn’t find a bug and because it’s now possible to parallelize these ‘slow’ tests? Seriously?
Speed has always seen as the Achille’s heel of end to end tests because everyone knows that fast feedback is good. But parallelization solves this right? We just need 50,000 VMs like Salesforce?
Firstly, parallelization of end to end tests actually introduces its own problems, such as what to do with tests that you can’t run in parallel (for example, ones that change global state of a system such as a system message that appears to all users), and it definitely makes test data management trickier. You’ll be surprised the first time you run an existing suite of sequential e2e tests in parallel, as a lot will fail for unknown reasons.
Secondly, the test feedback to someone who’s made a change still isn’t fast enough to enable confidence in making a change (by the time your app has been deployed and the parallel end-to-end tests have run; the person who made the change has most likely moved onto something else).
But the real problem with end to end tests isn’t actually speed. The real problem with end to end tests is that when end to end tests fail, most of the time you have no idea what went wrong so you spend a lot of time trying to find out why. Was it the server? Was it the deployment? Was it the data? Was it the actual test? Maybe a browser update that broke Selenium? Was the test flaky (non-deterministic or non-hermetic)?
“…unlike unit tests, the functional tests don’t tell you what is broken or where to locate the failure in the code base. They just tell you something is broken. That something could be the test, the browser, or a race condition. There is no way to tell because functional tests, by definition of being end-to-end, test everything.”
So what’s the answer? You have David’s FUD about unit testing not catching a major bug in BaseCamp. On the other hand you need to face the issue of having a large suite of end to end tests will most likely result in you spending all your time investigating test failures instead of delivering new features quickly.
If I had to choose just one, I would definitely choose a comprehensive suite of automated unit tests over a comprehensive suite of end-to-end/system tests any day of the week.
Why? Because it’s much easier to supplement comprehensive unit testing with human exploratory end-to-end system testing (and you should anyway!) than trying to manually verify units function from the higher system level, and it’s much easier to know why a unit test is broken as explained above. And it’s also much easier to add automated end-to-end tests later than trying to retrofit unit tests later (because your code probably won’t be testable and making it testable after-the-fact can introduce bugs).
To answer our question, let’s imagine for a minute that you were responsible for designing and building a new plane. You obviously need to test that your new plane works. You build a plane by creating parts (units), putting these together into components, and then putting all the components together to build the (hopefully) working plane (system).
If you only focused on unit tests, like David mentioned in his Basecamp example, you could be pretty confident that each piece of the plane would be have been tested well and works correctly, but wouldn’t be confident it would fly!
If you only focussed on end to end tests, you’d need to fly the plane to check the individual units and components actually work (which is expensive and slow), and even then, if/when it crashed, you’d need to examine the black-box to hopefully understand which unit or component didn’t work, as we currently do when end-to-end tests fail.
But, obviously we don’t need to choose just one. And that’s exactly what Airbus does when it’s designing and building the new Airbus A350:
As with any new plane, the early design phases were riddled with uncertainty. Would the materials be light enough and strong enough? Would the components perform as Airbus desired? Would parts fit together? Would it fly the way simulations predicted? To produce a working aircraft, Airbus had to systematically eliminate those risks using a process it calls a “testing pyramid.” The fat end of the pyramid represents the beginning, when everything is unknown. By testing materials, then components, then systems, then the aircraft as a whole, ever-greater levels of complexity can be tamed. “The idea is to answer the big questions early and the little questions later,” says Stefan Schaffrath, Airbus’s vice president for media relations.
The answer, which has been the answer all along, is to have a balanced set of automated tests across all levels, with a disciplined approach to having a larger number of smaller specific automated unit/component tests and a smaller number of larger general end-to-end automated tests to ensure all the units and components work together. (My diagram below with attribution)
Having just one level of tests, as shown by the stories above, doesn’t work (but if it did I would rather automated unit tests). Just like having a diet of just chocolate doesn’t work, nor does a diet that deprives you of anything sweet or enjoyable (but if I had to choose I would rather a diet of healthy food only than a diet of just chocolate).
Now if we could just convince Salesforce to be more like Airbus and not fly a complete plane (or 50,000 planes) to test everything every-time they make a change and stop David from continuing on his anti-unit pro-system testing anti-intellectual rampage which will result in more damage to our industry than it’s worth.