→ The Rise of the Software Verifier

View story at Medium.com

I found this article rather interesting. I’m still not sure if some of it is satire, forgive me if I misinterpreted it.

“DevOps has become so sophisticated that there is little fear of bugs. DevOps teams can now deploy in increments, monitor logs for misbehavior, and push a new version with fixes so fast that only a few users are ever affected. Modern software development has squeezed the testers out of testing.

Features are more important than quality when teams are moving fast. Frankly, when a modern tester finds a crashing bug with strange, goofy, or non-sensical input, the development team often just groans and sets the priority of the bug to the level at which it will never actually get fixed. The art of testing and finding obscure bugs just isn’t appreciated anymore. As a result, testers today spend 80% of their time verifying basic software features, and only 20% of their time trying to break the software.”

The author doesn’t say where the 80:20 figures came from, but the testers I have worked with for the last five years have spent zero time on manual regression testing verification and most of their time actually testing the software we were developing. How did we achieve this? Not by splitting our team into testers and verifiers as the author suggests:

What to do about all this? The fix is a pretty obvious one. Software Verification is important. Software Testing is important. But, they are very different jobs. We should just call things what they are, and split the field in two. Software testers who spend their day trying to break large pieces of important software, and software verifiers, who spend their time making sure apps behave as expected day-to-day should be recognized for what they are actually doing. The world needs to see the rise of the “Software Verifier”.

We did this by focussing on automating enough tests that we were confident to release our software frequently being confident we weren’t introducing major regressions. This wasn’t 100% test coverage, it was just enough test coverage to avoid human verification. We obviously spent effort maintaining these tests, but that’s a whole team effort and it freed up a lot of time to spend the rest of our time testing the software and looking for real life bugs using human techniques.

Another thing I noted about the article was the use of the graph to show decreasing interest in software testing:

But even their interest is Software Testing fading fast…



 
This also applies to software in general, perhaps even more dramatically:


I don’t think there’s a decreasing interest in software testing, or software, but rather these have become more commonplace and more commoditised, so people need to search for these less.

Creating a skills-matrix for t-shaped testers

I believe the expression “jack of all trades, master of none” is a misnomer, as I’ve mentioned previously. Being good at two or more complimentary skills is better than being excellent at just one, in my opinion.

But what about being excellent at one skill, and still being good at two or more? Why can’t we be both?

Jason Yip describes a T-shaped person and the benefits that having t-shaped people on teams brings:

A T-shaped person is capable in many things and expert in, at least, one.
As opposed to an expert in one thing (I-shaped) or a “jack of all trades, master of none” generalist, a “t-shaped person” is an expert in at least one thing but also somewhat capable in many other things. An alternate phrase for “t-shaped” is “generalizing specialist”.

jason yip
image by Jason Yip

Ideally we’d like to have a team of t-shaped testers in Flow Patrol at Automattic. But how do we get to this end goal?

I recently embarked on an exercise to measure and benchmark our skills and do just this with our team. Here’s the steps we took.

Step One – Devise Desired Team Skills

The first thing we did was come up with a list of skills that we have in the team and would like to have in the team. These can be ‘hard’ skills like a specific programming languages and ‘soft’ skills like triaging bugs. In a standard co-located team this would be as easy as conducting a brainstorming session and using affinity grouping to discover these skills. In a distributed environment I wrote a blog post to my team’s channel and had individual members comment with a list of skills they thought appropriate, and then I did the grouping and came up with a draft list of skills and groups.

Step Two – Self-assess against a team skills matrix

Once I had a final list of skills and groups (see below for full list), I put together a matrix (in a Google Spreadsheet) that listed team members on the x-axis, and the skills on the y-axis, and came up with a skill level rating. Our internal systems use a three level scale (Newbie, Comfortable, Expert) which we didn’t think was broad enough so we decided upon five levels:

1. Limited
2. Basic
3. Good
4. Strong
5. Expert

 

skills_matrix
Team Skills Matrix

I hadn’t seen Jason yip’s visual representation at that point in time, otherwise I may have used something like that, which has five similar levels:

matrix jason yip
Image by Jason Yip

Step Three – Publish results and cross-skill

Once we had the self assessments done we could then publish the data within our organisation and use the benchmark to cross-skill people in the team. In a co-located environment this could involve pair programming, in a distributed one it could involve mentoring and reviewing other team member’s work.

Have you done a skills matrix for your team? How did you do it? What did you discover?


Full List of Skills and Skill Groups for Flow Patrol at Automattic

Automattic Product Knowledge
WordPress Core
WordPress.com Simple Sites
WordPress.com Atomic Sites
Jetpack
Woocommerce
Simplenote
Mobile Apps
Human Software Testing
Flow Mapping
Bug Triage & Prioritization
Exploratory Testing (pre-release)
Dogfooding
Cross-browser Cross-device Testing
Facilitating Beta/Community Testing
Facilitating User Testing
Usability Testing
Accessibility Testing
Automated Testing
Automated End-to-end Browser Testing
Automated API/Integration Testing
Automated Unit Testing
Automated Visual Regression Testing
Android Automated Testing
iOS Automated Testing
Programming Languages
JavaScript
PHP
Shell Scripting
Objective C
Swift
Android/Kotlin
Testing Tools/Frameworks
Mocha
WebDriverJS
Git/Github
CircleCI
TravisCI
Team City (CI)
Mailosaur
Applitools
VIP Go
Docker
Other
i18n Testing
Performance Testing
Security Testing
User advocacy – empathy and compassion
Mentoring/onboarding
Project Management
Product Management
Product Development 
Calypso
Jetpack
WP.com API PHP
Woocommerce
iOS App
Android App

 

The blurry line between test and development

One of the themes I talked about during my presentation in Wellington was the blurry line between test and development in a distributed environment like Automattic.

I was recently having trouble with a complex method in our WordPress.com e2e test page objects, so I used my skills as a developer and wrote a change to our user interface which adds a data attribute to the HTML element.

This meant our page object method immediately went from this:

Continue reading “The blurry line between test and development”

Test for Real Life

“Most of us are anxious pretty much all the time – but frequently imagine that other people aren’t. It’s time to admit the truth. Anxiety is just a basic fact about being human.”

~ Alain de Botton

We are all human, we are all worried and anxious pretty much all the time, people just don’t tell you that they are. We wear masks and we hide it well.

But why do we test like we’re not anxious or worried? Why don’t we test for real life?

Continue reading “Test for Real Life”

Make sure your end-to-end tests align with your company’s strategy

I recently embarked on writing some new automated end-to-end tests for an existing product that has been around for some time but has never had e2e automated tests written for it.

Continue reading “Make sure your end-to-end tests align with your company’s strategy”

Should you close old bugs?

Do you actively close bugs because they reach a certain age?

One of the (many) things I love about Automattic is the attention that is given to bug triage. Bug triage is the habit of continually grooming our bug lists to ensure they are constantly relevant, updated and reflective of the current state of our products. A benefit of this is that an up-to-date and prioritized bug list translates directly into a backlog of maintenance work items for a product development team.

Continue reading “Should you close old bugs?”

(Not) Lying about Writing Code

I recently saw this quote in an article by Nikita Hasis on Medium.

“If Your Test Leaders Aren’t Telling You To Write Code, They Are Lying!
Even if it’s by omission.

There’s this argument, almost daily, about whether software testers should learn programming. I’ll jump right in. It is unimaginable that someone would tell you NOT to learn something. That’s the first, and probably shittiest lie that inexperienced testers get fed. It’s further unimaginable, and downright irresponsible to tell people not to learn something that is very clearly where a large, well-paying, and above all interesting part of the industry is heading. Wanna work on innovative, data-driven projects with smart and driven people? You probably need to pull up terminal and at least get your toes wet, y’all.

The worst part of the lie is that it imposes that coding is a difficult grind and will only cause more problems than it solves. I even saw Alister Scott’s blog post referenced as an argument against coding, ironic as it is.”

~ Nikita Hasis (Medium)

Since Medium is a walled garden that doesn’t allow you to leave a comment without creating an account I’ll leave my response here instead (where anyone is free to comment however they like).

Continue reading “(Not) Lying about Writing Code”

AMA: automation testing channel

stevenguyen87 asks…

Could you share us some automation testing channel that could help up update the news of testing trend also improve ourself for a better technical skill and problem solved

My response…

There’s an awesome blog/channel, right here on WordPress.com, that meets your needs perfectly, it’s called Five Blogs. So make sure you check it out and you can follow it for great frequent updates.

AMA: Cross-browser Testing

Marisa Roman asks…

I have been testing web apps for over ten years, and making cross-browser testing “suck less” has been and still is a top goal of mine. I recognize that visual presentation/layout must be reviewed by human eyes, but given the growing number of OS/device/browser combinations we need to support/test, I feel like I’m missing an opportunity to streamline things every time I spin up a dozen VMs to check a new page.

Here’s what I do currently, using an online tool that provides access to various OS/device/browser combinations
1. I spin up a VM for an OS/device/browser combo I’m checking and check the page
2. Repeat step 1 for each combo I need to check

I have done a little bit of research on the tool’s APIs and I think I could at least automate the process of spinning up each combination I need.

I have also tried tools that purport to be able to play back your recorded Selenium IDE steps in whichever configurations you choose, but it didn’t work very well even if I took the time to update the recorded steps to use reliable locators.

Also, while we do have automated smoke and regression suites using Selenium, I have not been exposed to or thought of an automated approach to checking page layout that doesn’t immediately seem like it would be awful to maintain (other than perhaps just recording screencasts while interacting with each page and having a human review them).

So: How do you approach cross-browser testing for new feature development and for regression purposes?

Thanks so much for your AMA and I hope you pick my question!

My response…

I’ll split the response into two parts: what I recommend for cross-browser regression testing, and what I recommend for cross-browser new feature testing.

Cross-browser Testing for Regression Purposes

I am still on the opinion that there’s little-to-no return on investment (ROI) in running automated functional regression tests across different browsers. My approach is to  typically understand what your most used customer browser is (most likely Chrome) and automate your e2e regression tests against that. I’m still of the opinion, even though tools like Selenium-WebDriver have multi browser support, that maintaining a suite of e2e tests that work consistently across multiple browsers is an onerous task. The one variant that that I do like to automatically test is different screen resolutions, as fully responsive web applications can functionally behave differently at different screen widths in the same browser. At WordPress.com, for example, we run our e2e tests against three screen sizes in Chrome (mobile, tablet, and desktop).

We also run automated visual comparison tests to ensure we don’t introduce unexpected variances in our interface design/appearance. These run in the same three sizes in a single browser (which happens to be Firefox for technical reasons). They have some dynamic content capability so if the layout of the page looks okay, but the content is slightly different, then they still pass. There still is an additional overhead in maintaining these in addition to our functional tests though.

Whilst automated e2e tests are great to cover key scenarios for regression purposes, I have found it also very useful to supplement this with continuous exploratory testing of existing functionality in real world use (dogfooding) in different browsers, different operating systems and on different devices. This picks up real human issues that our automated e2e and visual comparison tests don’t find.

We are huge believers in continuous dogfooding at WordPress.com to the extent that we recently built a Slack ‘testbot’ that suggests both a real user flow and a browser/OS to test that on for when you feel like testing something. For example:

alisterscott: I am looking for something to test
testbot: @alisterscott: Try creating a new post making sure you add some media in IE10

Cross-browser Testing for New Feature Testing

I don’t believe you can test all new features on all browsers (unless you have a really big team maybe). So you can either take a risk based approach (test the most used browsers first), or you can just mix it up and test different features in different browsers.

Sometimes there may be exceptions, I recently tested a upgraded version of our WYSIWYG editor and I wanted to be sure that this worked on various browsers – even upcoming ones which is what the new editor was adding support for.

As for how you get access to these browsers to test, I develop and test mostly on OSX, so I test in Firefox, Chrome, Chrome Canary, Safari and Safari Technology Preview on OSX.

Our WordPress.com admin interface Calypso only supports IE10 and Edge, so if I want to test in either of those, I use a freely, legally available Microsoft VM running in VirtualBox on OSX to test this. These VMs work really well.

I know of people who prefer a cross-browser testing service like Sauce Labs, CrossBrowserTesting, BrowserStack, browserling or many others, like you’ve mentioned in your question.

If you’re just after some quick and free screenshots of a public page, you can also use this Microsoft utility.

Summary

To summarise, cross-browser testing still sucks, but it’s still a thing we need to do, especially when we have diverse groups of users with different devices and browsers. There is a trend towards browser vendors fully embracing/adopting open browser/web standards so hopefully browser specific bugs, or quirks, will soon become a thing of the past. For example Microsoft Edge is a much nicer browser to develop for and test than previous Internet Explorer versions. One can only hope and pray.

AMA: how to cope as a solo tester

Bodda asks…

hey, i like your blog, i’m reading your blog on a daily basis, and i just want to ask you what i can i do to enhance my skills, knowledge and to be a be good in functional testing (manual and automation) IF i’m the ONLY tester in my current company (performing all testing activities), but i feel that i have a mess in my head and lots to learn to be up to date with the last trends.
my question now “What to learn, When(everyday?) and How in case your are the only tester in your company”.

i hope to answer my question soon to make my head calm down

My response…

I have fond memories of a project where I worked as the solo tester on a software delivery team; we had something like 8 developers (including one lead who took on iteration management tasks) and me as a tester. That was it. I loved it because we established really good unhindered rapport with our business stakeholders, and I was always busy!

But getting back to your question: when I was working in that situation it was vital that we had developers working on as much test automation as possible, alongside functional development, since there was just too much functional testing to do for a single person also doing test automation. I found myself spending about 80% of my time just testing new functionality and spot-testing different browsers/devices etc. The remaining 20% I spent ensuring we had good regression test coverage through the automated tests that developers were writing and I was helping maintain. So first and foremost I would strongly encourage you to get the developers you work with to have as much responsibility as possible for automated tests.

If you have this in place you should be able to take a deep breath and spend more time doing quality testing. I have found the best way to learn is on the job, you’ll be in a good position to do this, and often you’ll learn best by making your own mistakes.

learning - 1
via The New Yorker

If you want to know more about how to become better at what you do, I’ve shared some tips in a previous answer. There’s also my Pride & Paradev book which I wrote whilst working as a solo tester on a team.

All the best.

The craziest bug I have ever seen

Imagine if someone came to you and told you that your website was causing their laptop to throw a fatal system error, the dreaded ‘blue screen of death‘, what would your response be?

Well I know what my response would be because it happened to me. My response was “no way! That can’t happen! A website can’t make a computer BSOD!” I would have bet $1000 on that. Turns out I was wrong; very wrong.

I was working for a very popular pizza delivery chain and one day our development team began receiving reports of customer’s complaints (mostly via social media) that our site was causing their computers to throw blue screen of death errors! We laughed about it, yeah right, that can’t happen! Crash a browser tab maybe, but not an operating system. But we tried to reproduce it anyway on the large number of laptops we had and no matter how hard we tried we couldn’t reproduce it.

A few days later a member of our customer support team appeared saying our site just BSOD’d his laptop! We were curious, very curious. So we started the laptop back up and visited our site and voila! BSOD!

Now here’s where you might not believe me, so I took a video of it as proof, for your enjoyment:

Now that we had a single laptop that consistently reproduced the BSOD we could work out why it was happening.

It was only happening in Chrome, and only on this single laptop which ran Windows 8. We built/ran our site on a developer’s machine and accessed it via this laptop and could reproduce the crash every time.

We noted that the version of Chrome was one version behind the latest version on every other laptop we had – the update had somehow stopped and was stuck at that version.

But we didn’t update Chrome as that would have destroyed our single machine that was reproducing our issue! (It is pretty much impossible to get find older versions of Chrome).

Since we had our site running on a developer’s machine and reproducing the issue, all we could do was remove piece by piece of Html/CSS/JavaScript until we could discover what was causing the issue.

After some time, it was a lengthy process as each test would result in a reboot, when we removed a resource reference to a font our site used, which was actually a Google Font on their CDN, it suddenly stopped BSODing. We added it back and voila; it crashed.

A reference to a Google Font on our site was giving our customers Blue Screens of Death.

After some research we discovered it was a Chromium bug which affected all versions of Windows (only), as Chromium/Chrome were working on native font rendering for Windows. Google were very quick to patch this issue, however, if someone was stuck on an older version then it would still be an issue: there wasn’t anything we could do about it but inform our customers to ensure they are on the latest version of Chrome.

I learned a few lessons during that day:

  • bugs can happen anywhere and cause damage that you can’t imagine;
  • bugs aren’t always in your control: we didn’t write bad software to crash our customer’s machines – this wasn’t tied to a particular release that we did. You can’t just test changes to your site and expect it to be okay;
  • you can’t find every bug: to find this bug we would have had to constantly check our upcoming and production site against every upcoming version of Chrome on every operating system. Chrome isn’t like IE with releases every few years, you’d almost need a full time tester just to perform this role; and
  • a website can blue screen of death your laptop.

AMA: How do you teach someone exploratory testing?

Paul asks…

How do you teach someone exploratory testing?

My response…

For something different, let me start with a quote:

“We shall not cease from exploration, and the end of all our exploring will be to arrive where we started and know the place for the first time.”
~ T. S. Eliot

As a father of three children, I believe humans are innate explorers. So exploring a system should come natural to most people, but I’ve found a lot of people can explore a system and not find any bugs.

Techniques like session-based testing attempt to introduce measurement and control to exploratory testing so people are meant to be more effective, but, like the gorilla basketball video has shown us, introducing a goal for a session can blind us to the things that aren’t specifically in that goal. Much like following a script can blind us to things that aren’t in that script.

So how do we teach people to find bugs by exploration?

wilful-blindnessI believe the biggest thing that stops people finding bugs by exploration is wilful blindness: choosing not to know. The way you can teach someone to be a better exploratory tester therefore is by teaching them to be less blind.

Margaret Heffernan explains this superbly well in her completely non-testing non-technical book that I think every tester should read:

“We make ourselves powerless when we choose not to know. But we give ourselves hope when we insist on looking. The very fact that wilful blindness is willed, that it is a product of a rich mix of experience, knowledge, thinking, neutrons and neuroses, is what gives us the capacity to change it. Like Lear, we can learn to see better, not just because our brain changes but because we do. As all wisdom does, seeing starts with simple questions: what could I know, should I know, that I don’t know? Just what am I missing here?”

I really recommend reading that book.

Animated GIFs on Bug Reports

Sometimes the simplest solution is the best solution.

I like the idea of attaching screen recordings to bug reports to demonstrate an issue as they  provide more context than just a single static screenshot or a series of screenshots.

In the past I have used QuickTime on Mac OSX to record a video file and attach that, but attaching/hosting/streaming video files is quite complicated for bug reports. At Automattic I have noticed people attaching animated GIFs instead: why didn’t I think of that?!?

There’s a really cool, yet oddly named, open source tool to capture screen recordings as animated GIFs: LICEcap. It works on Mac and Windows and allows you to easily choose a section of your screen, record it and save it as a GIF.

Here’s a quick recording of how to use LICEcap I just made (LICEcap recording LICEcap – how meta):

licecap demo

The best part is it’s so easy to attach the GIF to a Github issue, or a blog post, or in a Slack chat, and it’s instantly viewable by pretty much anyone.

Simplicity FTW.

Microservices: a real world story

Everywhere I turn I hear people talking about microservice architectures: it definitely feels like the latest, over-hyped, fad in software development. According to Martin Fowler:

“…the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.”

[link]

But what does this mean for software testing? And how does it work in the real world?

Well, my small team is responsible for maintaining/supporting a system that was developed from scratch using a microservices architecture. I must highlight I wasn’t involved in the initial development of system but I am responsible for maintaining/expanding/keeping the system running.

The system consists of 30-40 REST microservices each with it’s own code-base, git repository, database schema and deployment mechanism. A single page web application (build in AngularJS) provides a user interface to these microservices.

Whilst there are already many microservices evangelists on board the monolith hate-train; my personal experience with this architectural style has less than pleasant for a number of reasons:

  • There is a much, much greater overhead (efficiency tax) involved in automating the integration, versioning and dependency management of so many moving parts.
  • Since each microservice has its own codebase, each microservice needs appropriate infrastructure to automatically build, version, test, deploy, run and monitor it.
  • Whilst its easy to write tests that test a particular microservice, these individual tests don’t find problems between the services or from a user experience point of view, particularly as they will often use fake service endpoints.
  • Microservices are meant to be fault tolerant as they are essentially distributed systems that are naturally erratic however since they are micro, there’s lots of them which means the overhead of testing various combinations of volatility of each microservice is too high (n factorial)
  • Monolithic applications, especially written in strongly typed/static programming languages, generally have a higher level of application/database integrity at compile time. Since microservices are independent units, this integrity can’t be verified until run time. This means more testing in later development/test environments, which I am not keen on.
  • Since a lot of problems can’t be found in testing, microservices put a huge amount of emphasis on monitoring over testing. I’d personally much rather have confidence in testing something rather than relying on constant monitoring/fixing in production. Firefighting in production by development teams isn’t sustainable and leads to impacted efficiency on future enhancements.

I can understand some of the reasoning behind breaking applications down into smaller, manageable chunks but I personally believe that microservices, like any evangelist driven approach, has taken this way too far.

I’ll finish by giving a real world metric that shows just how much overhead and maintenance is involved in maintaining our microservices architected system.

A change that would typically take us 2 hours to patch/test/deploy on our ‘monolithic’ strongly typed/static programming language system typically takes 2 days to patch/test/deploy on our microservices built system. And even then I am much less confident that the change will actually work when it gets to production.

Don’t believe the hype.

Addendum: Martin Fowler seems to have had a change of heart in his recently published ‘Microservice Premium’ article about when to use microservices:

“…my primary guideline would be don’t even consider microservices unless you have a system that’s too complex to manage as a monolith. The majority of software systems should be built as a single monolithic application. Do pay attention to good modularity within that monolith, but don’t try to separate it into separate services.”

[link]

Extensive post release testing is sign of an unhealthy testing process

Does your organization conduct extensive post-release testing in production environments?

If you do, then it shows you probably have an unhealthy testing process, and you’ve fallen into the “let’s just test it in production” trap.

If testing in non-production environments was reflective of production behaviour, there would be no need to do production testing at all. But often testing isn’t reflective of real production behaviour, so we test in production to mitigate the risk of things going wrong.

It’s also the case that often issues are found in a QA environment don’t appear in a local development environment.

But it makes much more sense to test in an environment as close to where the code was written as possible: it’s much cheaper, easier and more efficient to find and fix bugs early.

For example, say you were testing a feature and how it behaves across numerous times of day across numerous time zones. As you progress through different test environments this becomes increasingly difficult to test:

In a local development environment: you could fake the time and timezone to see how your application behaves.
In a CI or QA environment: you could change a single server time and restart your application to see how your application behaves under various time scenarios: not as easy as ‘faking’ the time locally but still fairly easy to do.
In a pre-production environment: you’ll probably have clustered web servers so you’ll be looking at changing something like 6 or 8 server times to test this feature. Plus it will effect anyone else utilizing this system.
In a production environment: you’ll need to wait until the actual time to test the feature as you won’t be able to change the server times in production.

Clearly it’s cheaper, easier and more efficient to test changing times in an environment closer to where the code was written.

You should aim to conduct as much testing as you can in earlier test environments and taper this off so by the time you can a change into production you’ll be confident that it’s been tested comprehensively. This probably requires some change to your testing process though.

Tests Performed per Environment

How to Remedy A ‘Test in Production’ Culture

As soon as you find an issue in a later environment, ask why wasn’t this found in an earlier environment? Ultimately ask: why can’t we reproduce this in a local environment?

Some Hypothetical Examples

Example One: our tests fail in CI because of JavaScript errors that don’t reproduce on a local development environment. Looking into this we realize this is because the JavaScript is minified in CI but not in a local development environment. We make a change to enable local development environments to run tests in minified mode which reproduces these issues.

Example Two: our tests failed in pre-production that didn’t fail in QA because pre-production has a regular back up of the production database whereas QA often gets very out of date. We schedule a task to periodically restore the QA database from a production snapshot to ensure the data is reflective.

Example Three: our tests failed in production that didn’t fail in pre-production as email wasn’t being sent in production and we couldn’t test it in pre-production/QA as we didn’t want to accidentally send real emails. We configure our QA environments to send emails, but only to a white-list of specified email addresses we use for testing to stop accidental emails. We can be confident that changes to emails are tested in QA.

Summary

It’s easy to fall into a trap of just testing things in production even though it’s much more difficult and risky: things often go wrong with real data, the consequences are more severe and it’s generally more difficult to comprehensively test in production as you can’t change or fake things as easily.

Instead of just accepting “we’ll test it in production”, try instead to ask, “how can we test this much earlier whilst being confident our changes are reflective of actual behaviour?”

You’ll be much less stressed, your testing will be much more efficient and effective, and you’ll have a healthier testing process.

Testing beyond requirements? How much is enough?

At the Brisbane Software Testers Meetup last week there was a group discussion about the requirement to test beyond requirements/acceptance criteria and if you’re doing so, how much is enough? Where do you draw the line? It came from an attendee who had a manager pull him up for a production bug that wasn’t found in testing but wasn’t in the requirements. If it wasn’t in the requirements, how could he test it?

In my opinion, testing purely against requirements or acceptance criteria is never enough. Here’s why.

Imagine you have a set of perfectly formed requirements/acceptance criteria, we’ll represent as this blue blob.

Requirements

Then you have a perfectly formed software system your team has built represented by this yellow blob

System

In a perfect, yet non-existent, world, all the requirements/acceptance criteria are covered perfectly by the system, and the system exists of only the requirements/acceptance criteria.

Requirements - System

But in the real world there’s never a perfect overlap. There’s requirements/acceptance criteria that are either missed by the system (part A), or met by the system (part B). These can both be easily verified by requirements or acceptance criteria based testing. But most importantly, there are things in your system that are not specified by any requirements or acceptance criteria (part C).

Requirements - System(1)

These things in part C often exist of requirements that have been made up (assumptions), as well as implicit and unknown requirements.

The biggest flaw about testing against requirements is that you won’t discover these things in part C as they’re not requirements! But, as shown by the example from the tester meetup, even though something may not be specified as a requirement, the business can think they’re a requirement when it effects usage.

Software development should aim to have as few assumptions, implicit and unknown requirements in a system as reasonably possible. Different businesses, systems and software have different tolerances for how much effort is spent on reducing the size of these unknowns, so there’s no one size fits all answer to how much is enough.

But there are two activities that a tester can perform and champion on a team which can drastically reduce the size of these unknown unknowns.

1 – User Story Kick-Offs: I have only worked on agile software development teams over the last number of years so all functionality that I test is developed in the form of a user story. I have found the best way to reduce the number of unknown requirements in a system is to make sure every user story is kicked-off with a BA, tester and developer (often called The Three Amigos) all present and each acceptance criterion is read aloud and understood by the three. At this point, as a tester, I like to raise items that haven’t been thought of so that these can be specified as acceptance criteria and are unlikely to either make it or not make it into the system by other means or assumptions.

2 – Exploratory Testing: As a tester on an agile team I make time to not only test the acceptance criteria and specific user stories, but to explore the system and understand how the stories fit together and to think of scenarios above and beyond what has been specified. Whilst user stories are good at capturing vertical slices of functionality, their weakness, in my opinion, is they are just a ‘slice’ of functionality and often cross-story requirements may be missed or implied. This is where exploratory testing is great for testing these assumptions and raising any issues that may arise across the system.

Summary

I don’t believe there’s a clear answer to how much testing above and beyond requirements/acceptance criteria is enough. There will always be things in a system that weren’t in the requirements and as a team we should strive to reduce the things that fall into that category as much as possible given the resources and time available. It isn’t just the testers role to either just test requirements or be solely responsible/accountable for requirements that aren’t specified, the team should own this risk.

Test your web apps in production? Stylebot can help.

I test in production way too much for my liking (more details in an upcoming blog post).

testinprod

Testing in production is risky, especially because I test in a lot of different environments and they all look the same. I found the only way I could tell which environment I was in was by looking closely at the URL. This was problematic as it led to doing things in a production environment thinking I was using a pre-production or test environment – oops.

I initially thought about putting some environment specific code/CSS into our apps that made the background colour different for each environment, but the solution was complex and it still couldn’t tell me I was using production from a glance.

I recently found the Stylebot extension for Chrome that allows you to locally tweak styles on any websites you visit. I loaded this extension and added our production sites with the background colour set to bright red, so now I immediately know I am using production as it’s bright red, be extra careful.

Stylebot Example

I’ve also set some other environments to be contrasting bright colours (purple, yellow etc.) so I am know from a quick glance what environment I am using.

I like this solution as I haven’t had to change any of our apps at all and it works in all environments: which is just what I needed.

Do you do something similar? Leave a comment below.

Software testers shouldn’t write code

Software testers shouldn’t write code. There I’ve said it.

“If you put too much emphasis on those [automated test] scripts, you won’t notice misaligned text, hostile user interfaces, bad color choices, and inconsistency. Worse, you’ll have a culture of testers frantically working to get their own code working, which crowds out what you need them to do: evaluate someone else’s code.”

~ Joel Spolsky on testers

I used to think that you could/should teach testers to write code (as it will make them better testers), but I’m now at a point where I think that it’s a bad idea to teach testers to code for a number of reasons:

  1. A software tester’s primary responsibility/focus should always be to test software. By including a responsibility to also write code/software takes away from that primary focus. Testers will get into a trap of sorting out their own coding issues over doing their actual job.
  2. If a software tester wants their primary focus to be writing code, they should become a software programmer. A lot of testers want to learn coding not because they’ll be a better tester, but they want to earn more money. These testers should aim to be become programmers/developers if they want to code or think they can earn more money doing that.
  3. Developing automated tests should be done as part of developing the new/changed functionality (not separately). This has numerous benefits such as choosing the best level to test at (unit, integration etc.) at the right time. This means there isn’t a separate team lagging behind the development team for test coverage.
  4. Testers are great at providing input into automated test coverage but shouldn’t be responsible for creating that coverage. A tester working with a developer to create tests is a good way to get this done.

I think the software development industry would be a lot better if we had expectations on programmers to be responsible for self-tested code using automated tests, and testers to be responsible for testing the software and testing the the automated tests. Any tester wanting to code will move towards a programming job that allows them to do that and not try to change what is expected of them in their role.

Update 19th Jan 2015: this post seems to have triggered a lot of emotion, let me clarify some things:

  • A tester having technical skills isn’t bad: the more technical skills the tester has the better – if they can interrogate a database or run a sql trace then they’ll be more efficient/effective at their job – and a tester can be technical without knowing how to code
  • I don’t consider moving from testing into programming by any means the only form of career advancement: some testers hate coding and that’s fine, other’s love coding and I think it would be beneficial for them to become a programmer if they want to code more than they test.
  • I still believe everyone should take responsibility for their own career rather than expecting their employer/boss/industry leader/blogger to do it for them (more about this here).

What is a good ratio of software developers to testers on an agile team?

The developer:tester ratio question comes up a lot and I find most, if not all, answers are “it depends”.

I won’t say “it depends” (it’s annoying). I will tell you what works for me given my extensive experience, but will provide some caveats.

I’ve worked on different agile software development teams as a tester for a number of years and I personally find a ratio of 8:1 developers to tester(s) (me) works well (that’s 4 dev-pairs if pair programming). Any less developers and I am bored; any more and I have too must to test and cycle time is in jeopardy.

Some caveats:

  • I’m an efficient tester and the 8:1 ratio works well when there’s 8 equally efficient programmers on the team – if the devs are too slow, or the user stories are too big, I get bored;
  • Everyone in the team is responsible for quality; I have to make sure that happens;
  • A story must be kicked off with the tester (me) present so I can question any assumptions/anomalies in the acceptance criteria before any code is written;
  • A story is only ready for test if the developer has demonstrated the functionality to me at their workstation (bonus points in an integrated environment) – we call this a ‘shoulder check’ – much the same way as monkeys check each others shoulders for lice;
  • A story is also only ready for test if the developer has created sufficient and passing automated test coverage including unit tests, integration tests (if appropriate) and some acceptance tests; and
  • Bug fixes take priority over new development to ensure flow.

What ratio do you find works for you?