AMA: Time Estimation

Paul asks…

What is your stance on time estimation (involved people, granularity/level of detail, benefit)?

My response…

I’d like to start by stating that I’m by no means an expert on this topic; so please take what you will from what I write.

Time and effort estimation for any software development activity is very difficult to do so often we get our estimates very wrong. I believe this is because we try to do up-front time and effort estimation without fully understanding the domain or the extent of the problem we’re solving; we still have many unknown-unknowns.

We can still do detailed/granular planning, but we should try to delay the detailed estimation of these until we have more information.

What I prefer is detailed planning  up front, which involves breaking large lofty goals down into small goals. These small goals are broken down further into the smallest possible manageable unit of work that delivers something, however small that something is. It’s important to break things down to this level as this enables continuous delivery, and flexibility in scope as a project progresses.

Once these small units of work are detailed, before trying to estimate these, I think there’s validity in starting work and delivering some of these units of work. This will mean it’s possible to more accurately estimate the remaining work based upon real delivery experience.

As soon as you begin working on each unit you should get a feel for the size and effort that is required for each unit, and over a period of time (say a fortnight) you can start to work out how many of these units you can achieve (your velocity).

If you’ve got the detailed plan of how many units total you’d like to achieve, it is probably at this point where you realise that what you wanted to achieve is going to take too long or cost too much. This realisation means you need to prioritise all remaining work, and focus on what is high priority.

I’ve never seen a project finish with the same intentions as when it started, so as you progress you will find some items get completely de-prioritised (no longer in scope), some things become higher priority so they get delivered sooner, and some completely new ideas/pieces of functionality may be decided upon and included in your plan.

Since you understand what you’ve been able to deliver you can then have sensible conversations about what is feasible given the resources available.

The craziest bug I have ever seen

Imagine if someone came to you and told you that your website was causing their laptop to throw a fatal system error, the dreaded ‘blue screen of death‘, what would your response be?

Well I know what my response would be because it happened to me. My response was “no way! That can’t happen! A website can’t make a computer BSOD!” I would have bet $1000 on that. Turns out I was wrong; very wrong.

I was working for a very popular pizza delivery chain and one day our development team began receiving reports of customer’s complaints (mostly via social media) that our site was causing their computers to throw blue screen of death errors! We laughed about it, yeah right, that can’t happen! Crash a browser tab maybe, but not an operating system. But we tried to reproduce it anyway on the large number of laptops we had and no matter how hard we tried we couldn’t reproduce it.

A few days later a member of our customer support team appeared saying our site just BSOD’d his laptop! We were curious, very curious. So we started the laptop back up and visited our site and voila! BSOD!

Now here’s where you might not believe me, so I took a video of it as proof, for your enjoyment:

Now that we had a single laptop that consistently reproduced the BSOD we could work out why it was happening.

It was only happening in Chrome, and only on this single laptop which ran Windows 8. We built/ran our site on a developer’s machine and accessed it via this laptop and could reproduce the crash every time.

We noted that the version of Chrome was one version behind the latest version on every other laptop we had – the update had somehow stopped and was stuck at that version.

But we didn’t update Chrome as that would have destroyed our single machine that was reproducing our issue! (It is pretty much impossible to get find older versions of Chrome).

Since we had our site running on a developer’s machine and reproducing the issue, all we could do was remove piece by piece of Html/CSS/JavaScript until we could discover what was causing the issue.

After some time, it was a lengthy process as each test would result in a reboot, when we removed a resource reference to a font our site used, which was actually a Google Font on their CDN, it suddenly stopped BSODing. We added it back and voila; it crashed.

A reference to a Google Font on our site was giving our customers Blue Screens of Death.

After some research we discovered it was a Chromium bug which affected all versions of Windows (only), as Chromium/Chrome were working on native font rendering for Windows. Google were very quick to patch this issue, however, if someone was stuck on an older version then it would still be an issue: there wasn’t anything we could do about it but inform our customers to ensure they are on the latest version of Chrome.

I learned a few lessons during that day:

  • bugs can happen anywhere and cause damage that you can’t imagine;
  • bugs aren’t always in your control: we didn’t write bad software to crash our customer’s machines – this wasn’t tied to a particular release that we did. You can’t just test changes to your site and expect it to be okay;
  • you can’t find every bug: to find this bug we would have had to constantly check our upcoming and production site against every upcoming version of Chrome on every operating system. Chrome isn’t like IE with releases every few years, you’d almost need a full time tester just to perform this role; and
  • a website can blue screen of death your laptop.

Futurespectives are fun

Since my team (and every team at Automattic) is 100% distributed, it’s important that we meet in person a few times a year (somewhere in the world) to hang out, co-work, eat and plan together: we call these team meetups.

Two weeks ago I spent the week in La Jolla in beautiful Southern California working with my team. Each team member was asked to suggest activities/projects to work on for the meetup and I suggested we do a futurespective.

Most people are familiar with a retrospective as they’re very common in agile software development, but I’ve found futurespectives to be much less common.

A futurespective is an activity where a team can work together to create a shared vision for the future.

There’s not a huge amount of information online about how to facilitate a futurespective, so I went with this structure:

  1. Prime directive (5 mins)
  2. Check-in: clear the air (5 mins)
  3. Explain the purpose of the excercise: what we are aiming to get out of this (5 mins)
  4. Move to the future: Imagine a nirvana state (20 mins)
  5. Coming back: Success factors that got us there (20 mins)
  6. Now: what can we do to start achieving those success factors (20 mins)

Prime Directive

I found this prime directive online, and whilst it sounds a little cheesy, it set the tone for the excercise which is about working together for a better future together:

‘Hope and confidence come from proper involvement and a willingness to predict the unpredictable. We will fully engage on this opportunity to unite around an inclusive vision, and join hands in constructing a shared future.’ – Paulo Caroli and TC Caetano

Check in

There’s no point working on a team excercise to plan for the future if there’s something in the air, so it’s worthwhile just checking in on the team and how everyone is feeling about the current state of things.

Explaining the Purpose of the Excercise

The prime directive is a good start for this, but it’s worth explaining that the team will be brainstorming and working together to achieve a list of action items at the end of the excercise that will directly impact our future.

Move to the Future: Imagine a Nirvana State (20 mins)

This is where you start by setting the scene 12-18 months in the future where a particular milestone has been successfully achieved – this might be finishing a big project you’re working on, or having launched a new product etc. This is the nirvana state. Ask a question that you would like answered by this excercise: for example: ‘what does testing and quality look like on this day?’

Get each person to spend 10 mins writing sticky notes about the state of your particular question, what it is like, but not delving into how it is like this.

An example might be: ‘everyone is confident in every launch’ or ‘everyone knows what the right thing to work on is’.

As each person is finished we put these sticky notes on a wall and logically group them, and then vote on which are most important (each person is given typically three votes and marks three notes or groups with a sharpie).

Coming back: Success factors that got us there (20 mins)

From the first excercise you should have a list of three or four most end-states, and now we use these to brainstorm for about 10 minutes the success factors (hows) that got us to these end-states.

For example, a success factor for ‘everyone is confident in every launch’ could be ‘unit tests are super easy to write/run all the time (fast)’.

Once people have had time to write these up, we logically group them under our three or four headings on the wall so we can see these clearly.

Now: what can we do to start achieving those success factors (20 mins)

Our final activity is working out what we can do now to lead to these success factors which will get us to our end-goals. At this point you can either brainstorm again, or as a team start discussing what we can do.

If you need some structure you could use “Start Doing/Stop Doing/Keep Doing” to prompt for ideas, otherwise any format you want.

The goal here is after 20 mins have a list of action items that you can easily assign to someone knowing that these will lead to success factors and your end goals you’ve come up with as a team.

An example would be ‘ensure that 100% bugs are logged in one tool (GitHub)’ which can be assigned to someone.

Ensure someone is tasked with taking photos and writing up the findings, at least the action items and circulating these around.

Summary

The Futurespective we ran as a team was very useful as it had enough structure that enabled us to get through a lot of thought in a short amount of time. We did this on the first morning of our meetup and having this structured activity set the tone for the week as we could refer back to what we’d discussed in future activities during the week.

I thoroughly recommend this as a team planning tool.

 

Eight thoughts on my Apple Watch

I’m a fairly late adopter: I bought an Apple Watch just a few weeks ago after the hype had settled down a bit and I could just walk in, try one on and buy one.

I bought the 42mm ‘sport’ model because I’ve got big wrists and my main intention with the watch is to measure various aspects of exercise I do.

Here some initial thoughts:

  1. The waterproof is really cool: whilst the touch doesn’t work well under water, I wear it in the shower, I’ve worn I swimming in the pool and also in the surf without any issues. It makes me wonder why we can’t make all our portable devices this waterproof Apparently it’s not waterproof (see comment below) and this isn’t recommended.
  2.  The battery isn’t that bad: I charge it overnight, and monitor a hour or so of exercise most days, and I still get to the end of the day with 50-60% of battery remaining. It could be better and last multiple days, but since I wear it overnight it doesn’t bother me.
  3. The notifications are awesome: The best part for me was that by default the notifications mirror your iPhone. I have minimal notifications set up (none for email etc) so I get minimal notifications on the watch. And apps don’t need to support the watch or be installed on the watch to send notifications on the watch. Plus if you’re using your phone your watch doesn’t notify you and vice versa. They’ve done really well with this.
  4. I don’t really use watch apps: There will probably be better ones with the new WatchOS that supports native apps, but the main purpose for me is glancing at my watch face and quickly seeing notifications. The only app I really use is the exercise one from Apple which monitors your heart rate, distance etc when you’re exercising.
  5. I use the modular watch face: it offers a good range of information I can glance at. Some of the other watch faces are fancy but can only see myself using these as a once off. watch face
  6. The activity rings are a good idea: especially the standing ring which notifies you towards the end of an hour when you haven’t stood up. Great.
  7. Transferring anything to the watch is really slow: and updates are really slow to install. But these happen so infrequently it doesn’t really matter that much.
  8. Nightstand mode is half done: I’d like it to be like an school alarm clock and always show the time in the dark, but unfortunately it only shows anything when tapped etc. Kinda defeats the purpose of this. Maybe there will be some options to enable always on in future updates.

Do you own an Apple Watch or another smart watch? What do you think of it?

How can open source projects deliver high quality software without dedicated testers?

I recently received the following email from a WatirMelon reader Kiran, and was about to reply with my answer when instead I asked to reply via a blog post as I think it’s an interesting topic.

“I see most of the Open source projects do not have a dedicated manual QA team to perform any kind of testing. But every Organization has dedicated manual QA teams to validate their products before release, yet they fail to meet quality standards.

How does these open source projects manage to deliver stuff with great quality without manual testers? (One reason i can think of is, developers of these projects have great technical skills and commitment than developers in Organizations).

Few things I know about open source projects is that they all have Unit tests and some automated tests which they run regularly.But still I can’t imagine delivering something without manual testing…Is it possible?”

I’ll start by stating that not all organizations have dedicated manual QA teams to validate their products before release. I used the example of Facebook in my book, and I presently work in an organization where there isn’t a dedicated testing team. But generally speakingI agree that most medium to large organizations have testers of some form, whereas most open source projects do not.

I think the quality of open source comes down to two key factors which are essential to high quality software: peer reviews and automated tests.

Open source projects by their very nature need to be open to contribution from various people. This brings about great benefit, as you get diversity of input and skills, and are able to utilize a global pool of talent, but with this comes the need for a safety net to ensure quality of the software is maintained.

Open source projects typically work on a fork/pull request model where all work is done in small increments in ‘forks’ which are provided as pull requests to be merged into the main repository. Distributed version control systems allow this to happen very easily and facilitate a code review system of pull requests before they are merged into the main repository.

Whilst peer reviews are good, these aren’t a replacement for testing, and this is where open source projects need to be self-tested via automated tests. Modern continuous integration systems like CircleCI and TravisCI allow automatic testing of all new pull requests to an open source project before they are even considered to be merged.

How TravisCI Pull Requests Work
From TravisCI

If you have a look at most open source project pages you will most likely see prominent real time ‘build status’ badges to indicate the realtime quality of the software.

Bootstrap's Github Page
Bootstrap’s Github Page

Peer reviews and automated tests cover contributions and regression testing, but how does an open source project test new features?

Most open source projects test new changes in the wild through dogfooding (open source projects often exist to fill a need and open source developers are often consumers of their own products), and pre-release testing like alpha and beta distributions. For example, the Chromium project has multiple channels (canary, dev, beta, stable) where anyone can test upcoming Chromium/Chrome features before they are released to the general public (this isn’t limited to open source software: Apple does the same with OSX and iOS releases).

By using a combination of peer reviews, extensive automated regression testing, dogfooding and making pre-release candidates available I believe open source projects can release very high quality software without having dedicated testers.

If an organization would like to move away from having a dedicated, separate test team to smaller self-sustaining delivery teams responsible for quality into production (which my present organization does), they would need to follow these practices such as peer reviews and maintaining a very high level of automated test coverage. I still believe there’s a role for a tester on such a team in advocating quality, making sure that new features/changes are appropriately tested, and that the automated regression test coverage is sufficient.

My useless websites

We recently had a competition at work where you had to create a ‘useless website’. There weren’t many rules to the contest (make it publicly accessible, SSFW, enter as many times as you like), so I decided to hedge my bets and create/submit half a dozen simple sites all using the same concept of randomly generating something.

It was a good example of disposable software as I could churn out an entire site in 10 or 15 minutes including publishing it live on github and didn’t have to worry about tests/technical debt or any such thing. It was really fun.

I ended up with a runner’s up award for the sloth site, the winner was my very talented colleague James and his ‘Potato Simulator 2015‘.

Here’s the six sites I created in a week:

pizzagenerator
Pizza Generator: randomly generate a succulent pizza (with bonus exotic mode)
Drink Tea Every Day: Australian Tea Tally
Drink Tea Every Day: Australian Tea Tally
Are you faster than a sloth?
Are you faster than a sloth?
Quote of the Day
Quote of the Day
Are you taller than a giraffe?
Are you taller than a giraffe?
Ralph says...
Ralph says…

100,000 e2e selenium tests? Sounds like a nightmare!

This story begins with a promo email I received from Sauce Labs…

“Ever wondered how an Enterprise company like Salesforce runs their QA tests? Learn about Salesforce’s inventory of 100,000 Selenium tests, how they run them at scale, and how to architect your test harness for success”

saucelabs email

100,000 end-to-end selenium tests and success in the same sentence? WTF? Sounds like a nightmare to me!

I dug further and got burnt by the molten lava: the slides confirmed my nightmare was indeed real:

Salesforce Selenium Slide

“We test end to end on almost every action.”

Ouch! (and yes, that is an uncredited image from my blog used in the completely wrong context)

But it gets worse. Salesforce have 7500 unique end-to-end WebDriver tests which are run on 10 browsers (IE6, IE7, IE8, IE9, IE10, IE11, Chrome, Firefox, Safari & PhantomJS) on 50,000 client VMs that cost multiple millions of dollars, totaling 1 million browser tests executed per day (which equals 20 selenium tests per day, per machine, or over 1 hour to execute each test).

Salesforce UI Testing Portfolio

My head explodes! (and yes, another uncredited image from this blog used out of context and with my title removed).

But surely that’s only one place right? Not everyone does this?

A few weeks later I watched David Heinemeier Hansson say this:

“We recently had a really bad bug in Basecamp where we actually lost some data for real customers and it was incredibly well tested at the unit level, and all the tests passed, and we still lost data. How the f*#% did this happen? It happened because we were so focused on driving our design from the unit test level we didn’t have any system tests for this particular thing.
…And after that, we sort of thought, wait a minute, all these unit tests are just focusing on these core objects in the system, these individual unit pieces, it doesn’t say anything about whether the whole system works.”

~ David Heinemeier Hansson – Ruby on Rails creator

and read that he had written this:

“…layered on top is currently a set of controller tests, but I’d much rather replace those with even higher level system tests through Capybara or similar. I think that’s the direction we’re heading. Less emphasis on unit tests, because we’re no longer doing test-first as a design practice, and more emphasis on, yes, slow, system tests (Which btw do not need to be so slow any more, thanks to advances in parallelization and cloud runner infrastructure).”

~ David Heinemeier Hansson – Ruby on Rails creator

I started to get very worried. David is the creator of Ruby on Rails and very well respected within the ruby community (despite being known to be very provocative and anti-intellectual: the ‘Fox News’ of the ruby world).

But here is dhh telling us to replace lower level tests with higher level ‘system’ (end to end) tests that use something like Capybara to drive a browser because unit tests didn’t find a bug and because it’s now possible to parallelize these ‘slow’ tests? Seriously?

Speed has always seen as the Achille’s heel of end to end tests because everyone knows that fast feedback is good. But parallelization solves this right? We just need 50,000 VMs like Salesforce?

No.

Firstly, parallelization of end to end tests actually introduces its own problems, such as what to do with tests that you can’t run in parallel (for example, ones that change global state of a system such as a system message that appears to all users), and it definitely makes test data management trickier. You’ll be surprised the first time you run an existing suite of sequential e2e tests in parallel, as a lot will fail for unknown reasons.

Secondly, the test feedback to someone who’s made a change still isn’t fast enough to enable confidence in making a change (by the time your app has been deployed and the parallel end-to-end tests have run; the person who made the change has most likely moved onto something else).

But the real problem with end to end tests isn’t actually speed. The real problem with end to end tests is that when end to end tests fail, most of the time you have no idea what went wrong so you spend a lot of time trying to find out why. Was it the server? Was it the deployment? Was it the data? Was it the actual test? Maybe a browser update that broke Selenium? Was the test flaky (non-deterministic or non-hermetic)?

Rachel Laycock and Chirag Doshi from ThoughtWorks explain this really well in their recent post on broken UI tests:

“…unlike unit tests, the functional tests don’t tell you what is broken or where to locate the failure in the code base. They just tell you something is broken. That something could be the test, the browser, or a race condition. There is no way to tell because functional tests, by definition of being end-to-end, test everything.”

So what’s the answer? You have David’s FUD about unit testing not catching a major bug in BaseCamp. On the other hand you need to face the issue of having a large suite of end to end tests will most likely result in you spending all your time investigating test failures instead of delivering new features quickly.

If I had to choose just one, I would definitely choose a comprehensive suite of automated unit tests over a comprehensive suite of end-to-end/system tests any day of the week.

Why? Because it’s much easier to supplement comprehensive unit testing with human exploratory end-to-end system testing (and you should anyway!) than trying to manually verify units function from the higher system level, and it’s much easier to know why a unit test is broken as explained above. And it’s also much easier to add automated end-to-end tests later than trying to retrofit unit tests later (because your code probably won’t be testable and making it testable after-the-fact can introduce bugs).

To answer our question, let’s imagine for a minute that you were responsible for designing and building a new plane. You obviously need to test that your new plane works. You build a plane by creating parts (units), putting these together into components, and then putting all the components together to build the (hopefully) working plane (system).

If you only focused on unit tests, like David mentioned in his Basecamp example, you could be pretty confident that each piece of the plane would be have been tested well and works correctly, but wouldn’t be confident it would fly!

If you only focussed on end to end tests, you’d need to fly the plane to check the individual units and components actually work (which is expensive and slow), and even then, if/when it crashed, you’d need to examine the black-box to hopefully understand which unit or component didn’t work, as we currently do when end-to-end tests fail.

But, obviously we don’t need to choose just one. And that’s exactly what Airbus does when it’s designing and building the new Airbus A350:

As with any new plane, the early design phases were riddled with uncertainty. Would the materials be light enough and strong enough? Would the components perform as Airbus desired? Would parts fit together? Would it fly the way simulations predicted? To produce a working aircraft, Airbus had to systematically eliminate those risks using a process it calls a “testing pyramid.” The fat end of the pyramid represents the beginning, when everything is unknown. By testing materials, then components, then systems, then the aircraft as a whole, ever-greater levels of complexity can be tamed. “The idea is to answer the big questions early and the little questions later,” says Stefan Schaffrath, Airbus’s vice president for media relations.

The answer, which has been the answer all along, is to have a balanced set of automated tests across all levels, with a disciplined approach to having a larger number of smaller specific automated unit/component tests and a smaller number of larger general end-to-end automated tests to ensure all the units and components work together. (My diagram below with attribution)

Automated Testing Pyramid

Having just one level of tests, as shown by the stories above, doesn’t work (but if it did I would rather automated unit tests). Just like having a diet of just chocolate doesn’t work, nor does a diet that deprives you of anything sweet or enjoyable (but if I had to choose I would rather a diet of healthy food only than a diet of just chocolate).

Now if we could just convince Salesforce to be more like Airbus and not fly a complete plane (or 50,000 planes) to test everything every-time they make a change and stop David from continuing on his anti-unit pro-system testing anti-intellectual rampage which will result in more damage to our industry than it’s worth.

Waterfall, Agile Development & Hyperbole

Hyperbole. Love it or hate it, it’s been around for centuries and is here to stay. And, as someone pointed out this week, I’m guilty as charged of using (abusing?) it on this blog. You just need to quickly flick through my recent posts to find such melodramatic titles such as ‘Do you REALLY need to run your WebDriver tests in IE?‘, ‘UI automation of vendor delivered products always leads to trouble‘, and  ‘Five signs you’re not agile; you’re actually mini-waterfall‘. Hyperbole supports my motto for this blog and my life: strong opinions, weakly held.

But it’s not just me who likes hyperbole mixed into their blog posts. Only this morning did I read the catchy titled ‘Waterfall Is Never the Right Approach‘ followed quickly with a similarly catchy titled rebuttal: ‘Why waterfall kicks ass‘ (I personally would have capitalized ‘NEVER’ and ‘ASS’).

While I found both of articles interesting, I think they both missed the key difference between waterfall and agile software development (and why waterfall rarely works in these fickle times): waterfall is sequential whereas agile development is (at least meant to be) iterative.

I personally don’t care whether you do SCRUM or XP, whether you write your requirements in Word™ or on the back of an index card, or even if you stand around in a circle talking about what card you’re working on.

What I do care about is whether you’re delivering business value frequently and adjusting to the feedback you get.

Sequential ‘big bang’ development such as waterfall, by its nature, delivers business value less frequently, and chances are when that value is realized the original problem has changed (depending on how long ago that was), because as I stated and believe, we live in fickle times.

Iterative development addresses this by developing/releasing small fully functional pieces of business value iteratively and adjusting to feedback/circumstance.

Just because an organization practices what they call ‘agile’, doesn’t mean they’re delivering business value iteratively. I’ve seen plenty of ‘agile’ projects deliver business value very non-frequently, they’re putting a sequential process into agile ‘sprints’ followed by a large period of end to end, business and user acceptance testing, with a ‘big bang’ go live.

Whilst I believe iterative development is the best way to work; I’m not dogmatic (enough) to believe it’s the only way to work. Whilst I believe you could build and tests parts of say an aeroplane iteratively, I still hope there’s it’s a sequential process with a whole heap of testing at the end on a fully complete aeroplane before I take my next flight in it.

Five signs you’re not agile; you’re actually mini-waterfall

Update: I’ve added five remedies to make you less waterfall in a separate post

I’ve noticed a lot of projects call themselves agile when in fact they’re mini-waterfall, also known as scrumfall. Here’s five warning signs that you’ll see if you fall into that category:

  1. Development for your user stories seems to take almost all of the iteration and only move to ‘ready for test’ during the afternoon of the last day of your iteration
  2. You have a whole lot of user stories that are waiting business ‘signoff’ and can’t be worked on
  3. You have a large chunk of time set aside at the end of the project for ‘user acceptance testing’
  4. Team members live in fear of changing something or moving a story card around something as they’re scared of being ‘told off’
  5. You develop in iterations but only release everything big bang at the end when everything is considered ‘done’

Long live the analyst-programmer

“Getting things done means doing things you might not be interested in. No matter how sexy a project is, there are always boring tasks. Tedious tasks. Tasks that a less mature engineer may deem beneath their dignity or their job title.”

~ John Allspaw

Once upon a time, before we called ourselves agile, there lived a role called an analyst-programmer. The analyst-programmer was a generalist before generalists became cool: just as content to analyze a requirement as to write some code and implement it.

Along came agile software development and its disturbing trend towards having senior developers that are above anything but pure coding. Writing SQL scripts for reference data, analyzing what is actually required, configuring a CI build: these are all tedious tasks that take away from what the senior developer is supposedly entitled to do: just write code to meet explicit acceptance criteria. The senior developer expects a flock of paradevs to run around doing their analysis, writing their acceptance criteria, and finally testing the code that they write. Some even expect the paradev to read the acceptance criteria aloud to them, because reading themselves isn’t coding.

You’ll start to notice who these senior developers are when you hear them say things like “I get paid too much to do this”, or “why are you wasting my time having me do this?”.

One day I imagine a world where all software development roles are suitably generalist and humble, that instead of complaining that “I’m too good for this”, people in these roles simply get their hands dirty and get things done.