A tale of three ruby automated testing APIs (redux)

Redux Note: I originally wrote a similar article to this before going on parental leave about six weeks ago. Whilst I didn’t intend to offend, it seemed that a few people took my article the wrong way. I understand that a lot of effort goes into creating a web testing API, but that doesn’t mean that everyone will agree with what you’ve made.

Sadly, an anonymous coward attacked myself and the company who I work (even though I don’t mention that company on this blog), so for the first time in this blog’s history, I have had to turn comment moderation on. I am sorry to the other genuine commenters whose comments have been lost in transition, and now have to wait for their new comments to be approved.

Since then I have received numerous emails asking where my article went, and commenting that people found it interesting and worthwhile. So I have decided to repost this article, hopefully with a little less contention this time around, making it clear, this is my opinion and experience: YMMV.


As a consultant I get to see and work on a lot of automated testing solutions using different automated web testing APIs. Lately I’ve been thinking about how these APIs are different and what makes them so.

My main interest is in ruby, and fortunately ruby has three solid examples of three different kinds of web testing APIs, two of which extend the lowest level API: selenium-webdriver.

I’ll (try to) explain here what I consider to be three kinds of automated web testing APIs and where I consider the sweet spot to be and and why.

A meaty example

As a carnivore, I thought I would explain my concept in terms I can relate to. If you’re a beef eater, there are many different kinds of beef that you can use to make some tasty food to eat. I’ll use three different kinds of beef for my example. The first (rawest) kind would involve getting a beef carcass and filleting it yourself to eventually make some edible food. The second kind of beef you could use is beef that is already in a slightly usable form, but you can then use yourself to make some edible food. For example, you can buy minced beef at a butcher, and then make your own hamburger patties, taco fillings etc from it. The final type of beef you could use is beef that has already been prepared so you can directly consume it, for example, sausages which can be cooked and consumed as is.

I consider these three examples of different kinds of beef to roughly correlate to automated web testing APIs, of which I also consider to be three kinds of.

The first is a Web Driver API, which is the rawest form of an API, its job is to drive a browser by issuing it commands. It provides a high level of user control, but like filleting a beef carcass it’s more ‘work’. An example in ruby of this API is the selenium-webdriver API, which controls the browser using the webdriver drivers.

The second kind of automated web testing API is the Browser API, which is a higher level API but still provides user control. This is the minced beef of APIs, as whilst it’s in a more usable form than a carcass, you still have a lot of control (and potential to what you can do with it). An example in ruby of this API is the watir-webdriver API, which uses the underlying selenium-webdriver carcass to control the browser.

The final kind of automated web testing API is the Web Form DSL (Domain Specific Language) which is a very high level API that provides users with specific methods to automate web forms and their elements. This is the beef sausages of APIs as sometimes you feel like eating something else besides sausages, but it’s difficult to make anything else edible but sausages from sausages. An example in ruby of this Web Form DSL is the Capybara DSL.

Visually, this looks something like this:

Show me the code™

So exactly what do these APIs look like?

I knew you’d ask, that’s why I came prepared.

Say I want to accomplish a fairly basic scenario on my example Google Doc form:

  • Start a browser
  • Navigate to the watir-webdriver-demo form
  • Check whether text field with id ‘entry_0’ exists (this should exist)
  • Check whether text field with id ‘entry_99’ exists (this shouldn’t exist)
  • Set a text field with id ‘entry_0’ to ‘1’
  • Set a text field with id ‘entry_0’ to ‘2’
  • Select ‘Ruby’ from select list with id ‘entry_1’
  • Click the Submit button

This is how I would do it in the three different APIs:

# * Start browser
# * Navigate to watir-webdriver-demo form
# * Check whether text field with id 'entry_0' exists
# * Check whether text field with id 'entry_99' exists
# * Set text field with id 'entry_0' to '1'
# * Set text field with id 'entry_0' to '2'
# * Select 'Ruby' from select list with id 'entry_1'
# * Click the Submit button

require 'bench'

benchmark 'selenium-webdriver' do
  require 'selenium-webdriver'

  driver = Selenium::WebDriver.for :firefox
  driver.navigate.to 'http://bit.ly/watir-webdriver-demo'
    driver.find_element(:id, 'entry_0')
  rescue Selenium::WebDriver::Error::NoSuchElementError
    # doesn't exist
    driver.find_element(:id, 'entry_99').displayed?
  rescue Selenium::WebDriver::Error::NoSuchElementError
    # doesn't exist
  driver.find_element(:id, 'entry_0').clear
  driver.find_element(:id, 'entry_0').send_keys '1'
  driver.find_element(:id, 'entry_0').clear
  driver.find_element(:id, 'entry_0').send_keys '2'
  driver.find_element(:id, 'entry_1').find_element(:tag_name => 'option', :value => 'Ruby').click
  driver.find_element(:name, 'submit').click

benchmark 'watir-webdriver' do
  require 'watir-webdriver'
  b = Watir::Browser.start 'bit.ly/watir-webdriver-demo', :firefox
  b.text_field(:id => 'entry_0').exists?
  b.text_field(:id => 'entry_99').exists?
  b.text_field(:id => 'entry_0').set '1'
  b.text_field(:id => 'entry_0').set '2'
  b.select_list(:id => 'entry_1').select 'Ruby'
  b.button(:name => 'submit').click

benchmark 'capybara' do
  require 'capybara'
  session = Capybara::Session.new(:selenium)
  session.has_field?('entry_0') # => true
  session.has_no_field?('entry_99') # => true
  session.fill_in('entry_0', :with => '1')
  session.fill_in('entry_0', :with => '2')
  session.select('Ruby', :from => 'entry_1')
  session.click_button 'Submit'

run 10

This is how long they took for me to run:

                        user     system      total        real
selenium-webdriver  1.810000   0.840000  22.130000 ( 73.123340)
watir-webdriver     1.940000   0.870000  24.380000 ( 79.388494)
capybara            1.950000   0.890000  24.080000 ( 79.920051)

Note: Capybara doesn’t always require a ‘session’, it’s only for non ruby rack applications, but since my example (Google) is not a rack application, as are most of the applications I test, my example must use the session.

When using ruby, why Watir-WebDriver is my sweet spot

I personally find Watir-WebDriver to be the most elegant solution, as the API is high enough for me to be highly readable/usable, but low enough to be powerful and for me to feel like I’m in control.

For example, being able to select an element by a explicit identifier (name, class name, id, anything) is a huge deal to me. I personally don’t like relying on the API to determine which selector to use: for example Capybara only supports name, id and label, but you can’t tell fill_in which specific one to choose: it appears to try each selector one by one until it finds it.

I have found that Watir-WebDriver also also provides lots of flexibility/neatness. For example: it’s the only API shown here that allows URLs to not have a ‘http://’ prefix (how many people do you know who type in http:// into a browser?).

In my opinion, the high level APIs like Capybara don’t provide enough control (for example – being able to specify the explicit selector), but the low level APIs like webdriver don’t provide enough functionality. This is evident when I am using a language other than ruby (like C#) when I find myself writing a large number of web element extension methods because webdriver doesn’t provide any of them. A .set method is a classic example, even Simon Stewart writes a clearAndType method in his examples even though he wrote webdriver which sadly misses it (you must call .clear, and .send_keys).

My biggest concern about high level field APIs

But my biggest issue with the high level APIs is that I’ve frequently seen them used to write test scripts that are step by step interactions with a web form. Instead of thinking of a business application as that, people see it as a series of forms that you ‘fill in’. This means people create scenarios like Aslak Hellesøy included in his recent post about cucumber web steps (which uses Capybara) and the problems it has created.

Scenario: Successful login
  Given a user "Aslak" with password "xyz"
  And I am on the login page
  And I fill in "User name" with "Aslak"
  And I fill in "Password" with "xyz"
  When I press "Log in"
  Then I should see "Welcome, Aslak"

I’m not saying it’s not possible to end up with something as ugly as above using other APIs, but I am saying the web form DSL style naturally relates to this: as the APIs look so similar to this style because that’s what the DSL was designed for: filling in forms. I’ve seen people frequently write generic, reusable cucumber steps to match the web form DSL like:

When /^I fill in "(.+)" with "(.+)"$/ do |value, field|
  fill_in field, :with => value

But this means you end up with less readable, less maintainable test scripts rather than business readable executable specifications.


Ultimately what I am looking for in an automated web testing API is simplicity and full control. I personally find browser APIs like Watir-WebDriver and Watir give me this, and this is why I love them so. Your mileage may vary, you may like different styles of APIs better, but I’ve seen other APIs so badly abused by people not even thinking about it, so it makes sense to think about what you’re trying to achieve and whether what you’re doing is the right way.

Author: Alister Scott

Alister is an Excellence Wrangler for Automattic.

10 thoughts on “A tale of three ruby automated testing APIs (redux)”

  1. In your last section you explained that cucumber example is somewhat anti-pattern, which I agree with you 100%. But I cannot seem to understand why you end up writing tests like this when using capybara, while you won’t while using watir? Can you enlighten me on this? Thanks


    1. As I say in the article:

      I’m not saying it’s not possible to end up with something as ugly as above using other APIs, but I am saying the web form DSL style naturally relates to this: as the APIs look so similar to this style because that’s what the DSL was designed for: filling in forms.

      What I have seen is people start writing tests in capybara style, fill in this, fill in that, then convert to something like cucumber and replicate that same style.
      I rarely see this happen when people use the watir API as their tests aren’t in that same web form DSL style to begin with.


  2. Great article. I agree with the main concepts here. Two things I want to add. First, the the cuke test can create their own page objects to hide the field interaction when they don’t want that. My former co-workers Jean-Philippe Boucharlat and David Cooper til me about encapsulating the interface activity in functions so that the script only describes the business action. David called it goal-oriented testing because there is a business goal in mind. That would mean the API used is hidden from the scriptor. The second is that the tool used typically depends on the need. I may only need sausage an be limited for time so it’s the best choice this time. I may be dealing with a new kind of recipe so the basic carcass is the best choice. It’s good to have all three tools in the toolbox depending on the need. At my last employer, we used webdriver to test interface features and watir to test business logic.

    I appreciate the research that you have put into this and sharing your analysis. There’s no need for people to get worked up about it.


  3. Is it possible to bypass the watir-webdriver API when working with a page that was loaded in a watir-webdriver browser object? I’d like to be able to send commands directly to Webdriver in its own syntax, within my Watir scripts… not on a regular basis, but in order to take advantage of functionality that WebDriver supports that wasn’t included in watir-webdriver (such as more advanced handling of alert boxes)



  4. You’re extremely helpful… Thank you. Now that I see that there’s an easy access to the “carcass” of Selenium, I see no real advantages to using Selenium 2 without the improved readability of watir-webdriver.


  5. I found the API analogy quite useful because it amusingly displays the concept of fine-grained-control vs. ease-of-use spectrum across the various tools. However, when approaching these tools from a more holistic perspective, I can see how the higher-level Capybara DSL could really shine when combined with a driver for Watir. When using Cucumber feature specifications, there is some need to be more aware that the idea is to separate the test fixtures from the feature descriptions.

    I haven’t seen any examples yet of using Capybara with Watir, and I’m looking for something that would allow use of the great high-level Capybara DSL, but with a driver for Watir. I’ve found a very small project called ‘capybara-celerity’ which seems to be a basic driver to support using Celerity rather than the Selenium or Rack Test drivers. Maybe I’m connecting some dots I see with lines describing implementation details that do not quite exist yet. However, I would like to point out the observation that Celerity is supposed to be API compatible with Watir. In theory a similar driver could be written for Watir.

    This would open the doors to using the great higher level DSL within the step definitions (implementation details of the tests), while still being able to swap out Watir for Selenium or Celerity. All with the added benefit of using Cucumber to specify the features in a readable english language format. As with any language, the level of detail supported by the language is proportional to the amount of specificity and declarative control it allows. These colorful details are usually added over time.

    Therefore, I propose that the either-or dichotomy between simplicity and full control can become a more colorful and unified both-and BDD testing environment. We can have our cake and eat it too! ^_^


  6. This is Awesome comparison. Thanks for your research.
    Coming completely from selenium-webdriver + Java world, I would like to learn more about advantages of watir-webdriver. I have two questions:
    1. Can you comment on disadvantages of using selenium-webdriver other than code being too verbose? Also, looking at the time taken to run these tests, selenium-webdriver took less time, does this mean that selenium-webdriver is more efficient?
    2. How good is watij when compared with Selenium-webdriver? I read at places that watij is not as rich as watir.

    Thanks again for sharing the information, really appreciate your effort and time.



Comments are closed.