100% Code Coverage?

Which codebase is better?

Our first codebase has 100% unit test coverage (all paths of the code are executed at least once during test execution):

// divide/index.js
export default (a, b) => {
  return a / b;
};
import divide from '.';
import expect from 'expect';

describe('myAwesomeCalculator', () => {
  it('can divide two numbers', () => {
    expect(divide(35, 7)).toBe(5);
  });
});

Our second code base has only 50% unit test coverage (only half the paths through the codebase are executed during test execution)

export default (a, b) => {
  if (b === 0) {
    throw new Error('Cannot divide by 0');
  }
  return a / b;
};
import divide from '.';
import expect from 'expect';
import assert from 'assert';

describe('myAwesomeCalculator', () => {
  it('cannot divide by zero', () => {
    try {
      divide(5, 0);
      assert.fail('exception was not thrown');
    } catch (error) {
      expect(error).to(new Error('Cannot divide by 0'));
    }
  });
});

I’ve met a lot of managers who would choose the first codebase because it has 100% unit test coverage which is much better than the code that is only 50% tested!

But code coverage is often a watermelon: green on the outside, red on the inside!

Higher code coverage doesn’t imply higher quality code, because it doesn’t mean higher quality tests. The accuracy and coverage of the test oracles you use determine the quality of the tests.

Beware of people boasting of high test coverage! They may only be looking at the outside of the watermelon.

Author: Alister Scott

Alister is an Excellence Wrangler for Automattic.

11 thoughts on “100% Code Coverage?”

  1. That watermelon analogy, priceless :)
    Totally agree with your post. Coverage doesn’t mean much to me, since it says nothing about the quality of the code, it just means how much of it was used on tests. 100% Coverage !== 100% Customer Satisfaction. I prefer to focus on improving the latter.

    Liked by 1 person

  2. I’m definitely not in favor of code coverage being the *sole* measure of code quality – note that analysis tools like SonarQube or CodeClimate count coverage as just one criteria among many. Code coverage measurements and improvements are more meaningful in a context where the quality of the tests are routinely scrutinized during code reviews.

    However, on my own team I have dealt with resistance to measuring coverage at all due to the sentiments expressed in this blog post. I’m just as opposed to that as I am the abuse or poor use of coverage stats. The potential for abuse of a thing doesn’t ipso facto negate the value of the thing. In my experience I have seen production bugs be less frequent and customer satisfaction be higher on projects where coverage was relatively high compared to those where it wasn’t measured at all or was relatively low.

    Like

  3. Code coverage to me is like a pretty shopping bag. It may look good on the outside but inside it may be full of… well you know. I think you and Jason hit it on the head though. It is a matter of the Oracle you use to determine how valid/good your tests are and having constant review & scrutiny of them that can help to determine where things stand.

    But in general in testing this is all true. “Quality” is in the eye of the beholder, with some very myopic views at times, and we need to be ever vigilant in our efforts to provide a sound product. With the complexities of software we cannot test for all scenarios (logic and data) and achieve 100% coverage of the code. We can only use Risk Management techniques to minimize the impacts and provide the best possible system to our end users.

    Liked by 2 people

    1. Really is a good issue, Alister. It exist a third opcion the second code base and two test cases, one testing a normal division and another a division by 0. This option is the best at all.

      A manager never must choose the first option versus the second. Because he/she is getting worse base code. I agree with Jason about it is necessary more measures to be sure about the code quality.

      But, what does happen with TDD? For me the answer is that first unit test is incomplete. It would mandatory, a test case to divide by 0.

      Liked by 2 people

      1. Thanks Maria. It’s an oversimplified example but I’m trying to highlight that most managers won’t read the code but instead rely on an external metric (code coverage) which can give a flawed impression of quality.
        Neither solution is ideal – as you said a third option to combine the two would be necessary – there’s also some weird behaviour in JavaScript where you divide zero by itself.

        Like

  4. In this particular case, I would probably choose the first code base; not because of the tests, but rather because in JavaScript we _can_ divide by zero and the second function breaks that assumption. Instead of returning `Infinity`, which I would expect, it crashes my program 😉

    It seems like targeting 85%+ code coverage is a reasonable goal. Subjective inspection and review may be more efficacious for uncovering bugs and subtle misbehaviors, but at least the unit tests, if complete enough, can automatically indicate if a certain set of changes broke something unexpected.

    Thanks for the article!

    Liked by 1 person

    1. Thanks for your comment. I was just trying to show a point but you’re right that JavaScript does allow you to divide by zero – the second codebase has at least thought of this. The fact that you can divide by zero in JavaScript highlights what an odd language it is when mathematically it isn’t possible. What makes it worse is if you divide zero by zero you get NaN #wat

      Liked by 2 people

      1. Well, for what it’s worth, the choice of infinity and NaN isn’t all to ridiculous and I would consider hardly an example of how crazy JavaScript it.

        The limit of dividing anything as the divisor approaches zero is equal to ∞, just as the the limit of dividing two numbers, each of which is approaching zero at the same rate is nonsense. Of course, if one of the numbers is approaching zero faster, then we would obviously have a valid limit again at either zero or infinity.

        In this sense, JavaScript is actually more sane than many languages which simply freak out and break. What is the difference between throwing an exception when attempting to divide by zero and returning NaN or Infinity? We have to handle both cases or at least be aware when dividing that this is a possibility. JavaScript gives us a couple more tools to analyze the result and make a decision based on it.

        Finally, let’s fix that code while assuming that any by-zero divisions are the result of rounding errors and not intentional divide-by-zeros…

        const div = ( a, b ) => a / ( b + Number.EPSILON )

        Now that’s some sane language functionality right there. 😊

        Liked by 1 person

        1. Sorry, I tried to pick an example to illustrate the point that test coverage doesn’t necessarily mean something is tested well. Probably a bad choice to choose divide by zero as it seems contentious, but I wanted something that’s common enough for most people to understand. Cheers

          Like

  5. So this is something I’ve been saying for years. Most unit test frameworks let you write ‘tests’ that have not a single Assert/Expect/when/{validation-of-choice}. They don’t raise a single red flag at that, as far as the frameworks are concerned they are valid tests. There is nothing in them that says “hey, you never tested anything, this is not a valid test, just a bunch of instructions”. It is entirely possible to get 100% code coverage from unit, integration, or end-to-end tests where a test will never fail no matter how badly the code is actually broken.

    several years ago I was working at a place where a prior ‘tester’ had written automation (in MS VisualTest if that tells you how long ago) that had at most maybe 10 validations total. big long scripts that followed entire user scenarios, and would run the product through a ton of actions, but the odds of catching anything being broken were pretty low.)

    I love the watermelon analogy. To me the distinction is that code coverage tells you ONLY how much of your code was exercised. It tells you zero about how much of it was actually tested. Yes it can tell you that the .pullup() method was called, but it won’t tell you if the chin cleared the bar or not. Only knowing in depth the quality of your tests (and as you well point out test oracles) can you know if the method just went through the steps, or if someone was watching to ensure it got the right result as that happened.

    A lot of managers with little coding experience want to believe that “90% code coverage means 90% of our code is tested.” The trick is to get them to understand that really “90% code coverage means the tests caused 90% of the code to be exercised.” without knowing the tests, you can’t know how well the code was tested during that process.

    I believe there are even urban legends among the geek community about projects where a boss insisted on ‘100% code coverage from unit tests or {some dire threat}’ and some percentage of the way through as things got harder to actually ‘test’ or the deadline loomed, the devs stopped writing tests with assertions, and merely started calling every method in the system until they got to the point that 100% of the code was exercised when they ran the “tests”.

    Liked by 1 person

  6. Personally, I do like having high test coverage, but like many here I don’t think having 100% coverage means you’ve got your code figured out. Adding mutation testing to the code base and keeping the testing pyramid in mind always helps. Thanks for the write up :3

    Liked by 1 person

Comments are closed.