Effective software testing: the testing pyramid

Testing types on various levels.

In the software world we have an ever growing collection of testing technologies at our disposal. Mainly there are three levels on which testing is commonly performed: the unit-, integration- and system-level. Oftentimes, it is not clear which functionality should be tested on which level.When should we write a unit, integration and system tests exactly? Should, for instance, every feature be verified using a system test? In this blog post we revisit the testing pyramid, which is a best practice that can be used to answer exactly these questions. Moreover, we will take a look at how the testing pyramid is applied internally to our Symflower project.

Shades of testing: unit, integration and system testing

Let’s shortly recap how the three testing types are defined and what their pros and cons are.

The testing pyramid.
  • Unit tests: Unit tests target the smallest building blocks of software and verify that they behave as expected. Typically, they are written to test individual functions and methods in isolation. The key advantage of unit testing is that they require as little context as possible. So there is no running database, no actual deployment or any other component that the test needs to take care of. As a result, they are straightforward to write, run fast and can be easily debugged. Also, with a failing unit test we do not have to search the entire code base to locate a problem. Basically, the unit-under-test is where the bug must be hiding.

  • Integration tests: While the majority of bugs can be found with unit tests, there are certain problems that cannot be found by definition. Think for instance of two components A and B that interact with each other. If the API of component B changes there won’t be any unit test that can catch that regression. By definition, a unit test only verifies each component A and B in isolation. This brings us to integration tests: they take a look at two or more components, and whether the interactions among them work as expected.

  • System tests: While integration tests look at more than one component, they do not cover the entire system. That is what system tests are for: they include the complete product, and test whether it behaves as it is supposed to. System tests have the most context among the three testing types, they take the longest to run and are the hardest to debug. After all, the underlying problem can hide potentially in any of the components of your system.

Does that mean we should not write system tests at all? No! But we should select carefully which functionality is tested on which level. If we can show a problem with a unit test, we use a unit test. Only if a problem cannot be shown with a unit test we go one level upwards by using an integration test or even a system test. Using this guideline you will have the most effective and efficient test suite especially where it counts: debugging and fixing problems. Through this procedure the testing pyramid is an inevitable result, i.e. most tests on the unit-level, fewer tests on the integration level and the least amount of tests on the system level.

Applying the testing pyramid to a Kubernetes microservice architecture

To not stay on the theoretical side of things, let’s take a look at how we embrace the testing pyramid internally at Symflower. From a very top-level perspective Symflower processes source code files and generates unit test files πŸ“ for them, e.g. for our example for Java.

Symflower receives Java files and generates Unit Test files for them.

The server version of Symflower is a microservice architecture using Docker and Kubernetes. A very simplified depiction of how these microservices play together looks as follows:

Symflower microserice communication pipeline.

The Transpiler microservice converts a Java source code file into an intermediate representation that is passed on to the Test Value Generation component. This component in turn figures out the values for testing. Finally, the Code Generation microservice uses these test values to generate Java unit test files πŸ“.

The testing pyramid for this architecture looks as follows:

  • Unit tests: target at a single microservice, or function within that microservice, in isolation. For example for the final Transpiler service this means that we verify whether certain source code files can be correctly read in. If a unit test fails, we know that it can only involve one component: the component of the unit test.
  • Integration tests: evaluate whether the three microservices work properly together. There is no need for any Kubernetes deployment to verify that. So the three binaries representing our microservices are simply plugged together, verifying whether a Java file can be correctly processed and results in a correct Java unit test file πŸ“. If the test files generated by Symflower fail unexpectedly , there must be something wrong. Maybe the “Test Value Generation” obtained false values or the intermediate representation from the “Transpiler” service was faulty. However, if we have sufficient unit tests for these components, we know that it must be a problem with the integration, e.g. the communication, of the components.
  • System tests: In order to run system tests, a deployment of the complete product is necessary. In our case, a Kubernetes cluster is deployed orchestrating our three microservices Transpiler, Test Value Generation and Code Generation. In case a unit test fails on this level and not on the unit nor integration level, we know that it must be part of an environment problem. After all, if the behavior has been fully validated on the unit test level, and the interaction of the components has been validated on the integration level, only the environment is new on this level.

How to test a new feature using the testing pyramid

Let’s assume Symflower learns how to process an if-statement. How would we test this new feature by applying the testing pyramid?

In Java an if-statement can have many different appearances:

static int simplestIf(int i) {
       if (i == 1) {
           return 0;
       }

       return i * 2;
}

static int ifWithoutABlock(int i) {
    if (i == 1)
        return 0;
    return i * 2;
}

static int ifWithElse(int i) {
    if (i == 1) {
        return 0;
    } else {
        return 1;
    }
}

static int cascadingIfElse(int i) {
    if (i == 1) {
        return 2;
    } else if (i == 3) {
        return 4;
    } else {
        return 5;
    }
}
…

First, we want to add tests at the unit-level for this functionality. We need to consider which test cases are important for each component. The Transpiler for instance needs to be able to “read” the above examples, but also countless options of different formatting. For instance the different placements of braces, newlines and spaces need to be supported. In essence, we need to go crazy and test all relevant versions of if-statements to make our Transpiler component bulletproof. The Code Generation component, on the other hand, does not really care for the formatting of the function under test. For instance the generated values for simpleIf and ifWithoutBlock are identical.

The integration test in the Symflower project uses a binary that directly connects the three components Transpiler, Test Value Generation and Code Generation in order to omit a full Kubernetes deployment. This binary executes the following procedure:

For each Java file `f` in test data
   - Send `f` to the integration-test-binary
   - Ensure there are no unexpected problems when executing the binary
   - Take the generated  `Java unit test file πŸ“` and execute it
   - Make sure there are only expected exceptions

This procedure allows us to test the correct integration of our three microservices by simply reusing the files and fixtures used for unit testing. However, we could also hold the integration test suite minimal, and only pick one case per interaction-path.

Finally, we need to decide on the required system tests for if-statements. We do not need separate system tests for all the different options of formatting an if-statement. Being able to deal with those is a pretty isolated problem for the microservice Transpiler and will result in the same Intermediate Representation anyways. But, we need at least one system test that ensures that the if-statement correctly passes through all three microservices. However, this time inside of the real system environment, i.e. with Docker and Kubernetes.

With this setup, usually when a unit test breaks also an integration test and probably the system test for if-statements will break. The good thing is, we do not need to start debugging either the system nor integration tests as long as there is a failing unit test. That is, we always look at the tiniest context that we need for fixing a problem. Ideally, as soon as all unit tests and integration tests are running through, a failing system test would mean there is actually a problem with the deployment and not, for instance, with how if-statements are handled by the Transpiler.

Conclusion

The testing pyramid is a best practice that gives us the best ratio of time spent in testing and debugging versus the likelihood to find bugs. For failing tests we work our way up the testing pyramid. First, all unit-tests need to pass before it makes sense to start debugging either the integration or system tests.

Subscribe to our newsletter to be notified for future posts on coding, testing and new features of Symflower. Feel free to share this article with your colleagues and friends via Twitter, LinkedIn or Facebook, if you want to help make their software testing a little less painful and more rewarding.

Technical | 2022-06-03