Writing Resilient Unit Tests

By Andrew Wilcox

I happened to be looking at a unit test I wrote a couple months ago and was dismayed to realize it wasn’t working any longer:

test("don't launch the rocket in stormy weather", function () {
    setup_rocket_systems_controller();
    setup_fake_ground_based_internet();
    set_weather_to_stormy();
    equal(ready_to_launch(), false);
});

Today this test will always go green — regardless of whether the code launches the rocket in stormy weather or not.

My unit test worked when I first wrote it.  But my test was also fragile: it was easily broken in the normal course of code development.

Here’s the code being tested as it appears today:

function ready_to_launch() {
     var controller = connect_to_rocket_systems_controller();
     if (! controller) {
         return false;
     }

     if (controller.get_fuel_level() < needed_fuel()) {
         return false;
     }

     var internet = controller.connect_to_ground_based_internet();
     if (! internet) {
         return false;
     }

     if (internet.obtain_weather_report() == WEATHER_STORMY) {
         return false;
     }

     return true;
}

Of course we could argue whether this is the best coding style to use: whether each check could be split out into a separate function, or whether exceptions should be thrown instead of returning false, and so on.  But with working unit tests such improvements in design are easy to make.

Without unit tests (or with broken unit tests!) changing the code can easily break things without us noticing.  With my broken unit test, someone could be working on improving or enhancing the code, and they could make a small typo that broke the code which prevents launches in stormy weather, and they would think that everything was OK because — after all — there’s clearly a unit test that checks for that!

Why did my unit test break?  What happened was:

  • We noticed that launching our rockets without enough fuel often resulted in a crash;

  • An engineer carefully reviewed all the changes that would be needed to implement a fix, and wrote unit tests to check that the code would not launch the rocket when it didn’t have enough fuel;

  • A couple places in the code that needed to be updated were overlooked but caught by unit tests going red, and so were quickly found and fixed;

  • A few unit tests broke by going red because they themselves now needed to fuel the rocket in order to do their test, but they were also quickly found and fixed;

  • My unit test also now needed to fuel the rocket to do its test, but what it checked was just that the rocket wasn’t ready to launch… and so it was now going green not because of stormy weather but because it was failing to fuel the rocket!

At this point the code to check for stormy weather could be inadvertently broken by further code changes and no one might notice.  We could even delete the check for stormy weather entirely and my unit test would still go green!  Bit rot has set in… there’s impressive looking code to avoid launching the rocket when we’re not supposed to… and impressive looking unit tests to test that… and yet when stormy weather comes we could actually launch the rocket anyway.

There are two ways for a unit test to be broken, false-red tests and false-green tests:

  • false-red tests go red even if the code feature they’re testing is correct;

  • false-green tests go green even if the code they’re testing is incorrect.

Because we run all unit tests automatically before pushing to production, red tests are quickly found and fixed.  Thus any buggy unit tests that might be able to sneak into the code base are the false-green ones: the ones that say “fine!  everything’s fine!  no problem!”… even when the code is broken.

“Happy path” unit tests rarely break by going false-green.  A happy path test is one that checks that the desired result is achieved when all the necessary conditions are in place.  E.g., we launch our rocket, and it makes it to low Earth orbit, and it inserts into the correct orbit, and it matches velocity with the space station, and it docks without crashing, etc. etc.  Or, in the case of IMVU, a happy path test would be that customers can place orders when they have enough credits, the product is available, and so on.

Happy path unit tests rarely go false-green because of entropy: there are typically many more ways for something to fail than there are for something to succeed.  It’s possible that I might manage to break a test and break the code at the same time and have my changes conspire to get the test to go green anyway…

test("2 + 2", function () {
    equal(plus(2, 2), 5);
})

but it’s rather unlikely.

“Sad path” unit tests on the other hand check that things do in fact fail when they’re supposed to.  “The rocket can’t be launched in stormy weather” is one example.  Another example would be “customers can’t buy a virtual product when they don’t have enough credits”.

Sad path unit tests are often important for the integrity of the business.  Suppose the check for not being able to buy a product when the customer had insufficient credits wasn’t working.  We might not notice this — just using our application ourselves — because when we use our application we don’t generally try to buy things when we don’t have enough credits.

But though they are often quite important, these sad path unit tests are more easily broken as the code continues to be developed because there are many ways that things can fail — and if the test is just checking for “something failed” then lots of things might be able to do that.  For example, this unit test will go green if the code throws any exception:

test("names must be a string", function () {
    raises(function () { set_name(3.14159) });
});

and so it will be broken by any bug that manifests by throwing an exception.  In fact when I review existing unit tests that merely check if an exception has been thrown, I’ve found that they’ve become useless more often than not!

Thus extra care is needed when writing sad path tests.  It’s not enough to just check whether the code failed — we need to check whether it specifically failed for the reason that we’re testing for.

For this we need to do an analysis: are there any other code paths that could generate the same result as we’re testing for?  For example, it is better to check for the specific exception being thrown — but that could still be buggy if there’s more than one way that specific exception could be thrown.

In the rocket example, one option would be to instrument the code to make it more testable by having it return a string indicating why we’re not ready to launch:

function ready_to_launch() {
    var controller = connect_to_rocket_systems_controller();
    if (! controller) {
        return "the rocket systems controller is offline";
    }

    if (controller.get_fuel_level() < needed_fuel()) {
        return "there is insufficient fuel to launch";
    }

    var internet = controller.connect_to_ground_based_internet();
    if (! internet) {
        return "the ground based Internet service is not available";
    }

    if (internet.obtain_weather_report() == WEATHER_STORMY) {
        return "flight protocols do not permit launching in stormy weather";
    }

    return null;
}

Code will often need some enhancements to make it testable, though you may be able to find ways that make the testability requirement less intrusive.

And so if I had written my unit test at the time to check for the specific condition I was testing for:

test("don't launch the rocket in stormy weather", function () {
    ...
    set_weather_to_stormy();
    equal(
        ready_to_launch(),
        "flight protocols do not permit launching in stormy weather"
    );
});

then when the fuel requirement had been added later my test would have broken by going falsely red instead of going falsely green — and it would have been fixed.

My test would have been a more resilient unit test — one that over time would have had a better chance of being kept working as development proceeded.