Icon for gfsd IntelliJ IDEA

Test impact analysis: Automatically run affected tests only

Reducing execution time with Test Impact Analysis

Test impact analysis identifies tests affected by code changes to save time on test execution. Even a basic implementation leads to a 29% reduction in test execution times on average, showcased here with a benchmark of the drop-in test command symflower test-runner. In this article, you learn about common approaches and how they impact your daily software development process.

Sneak peek: benchmarking top Go repositories shows dramatic reductions in test execution time, no matter the size of the repository and change:

Repository With test runner Without test runner Time saved Time saved (%)
ebiten 1m26s 2m33s 1m7s 43.61%
tailscale 10m55s 17m39s 6m44s 38.14%
go-ethereum 13m7s 20m45s 7m37s 36.76%
gonum 12m20s 19m26s 7m5s 36.51%
ugo 22m5s 24m57s 2m52s 11.49%
doit 33m7s 36m58s 3m51s 10.44%

In the following sections, we provide a detailed analysis of benchmark results, showcase a practical implementation through a running example, and outline next steps to improve its algorithms.

Table of contents:

Test impact analysis matters

The current standard for test execution during development and especially during CI (Continuous Integration) is to execute all tests, regardless of what has changed. Even a quick typo fix in an internal documentation (irrelevant to the user-facing product) leads to minutes and hours of CI time, unnecessary boot-ups of multiple testing deployments, blocked resources, and laborious manual processes. All of that just to confirm that nothing at all has changed in the behavior of the end result. A wasteful use of resources and developer time, but accepted by everyone because it is automated.

Now imagine a perfect scenario where some automation identifies that the internal documentation fix for a typo does not require any testing time, checks, or deployments. It might not even require any manual processes besides a quick glance from a reviewer. A change that took hours to publish now takes seconds. That is the purpose of test impact analysis.

Identifying the checks and tests that are necessary to perform enables:

  • Cutting execution time by running only affected checks and tests
  • Saving resources and infrastructure costs by using them only when needed
  • Accelerating software development and the earlier identification of bugs while coding

Automated test impact analysis fundamentally changes how developers write code. Remember, there used to be days when continuous syntax checking did not exist in editors. With test impact analysis, test suites are finally not as costly to execute and can be run continuously in the background, while coding. This leads to early insights, allowing developers to reuse their current context and stay in the flow. It can be also used to automatically fix simple mistakes right away, similar to how a grammar checker fixes typos while you type.

However, implementing test impact analysis comes with its own challenges, mainly stemming from the various levels of granularity that need to be considered. Which are discussed in the next section.

Levels of impact analysis

Test impact analysis involves three challenges:

  • Granularity of dependencies
  • Connecting dependencies to individual test cases
  • Granularity of test execution

Advancing towards more granularity needs to be a balanced process: it doesn’t matter that a dependency graph is granular down to the characters of a file if you cannot identify and execute individual test cases.

Granularity of dependencies

With test impact analysis, the basic premise is to walk a dependency graph for each individual test. A test must be executed again even if just one of its dependencies has changed. Dependency in this case refers to relationships in the application structure, e.g. package A of project X may be using package B that’s part of project Y.

Multiple levels of dependencies need to be considered:

  • Environment e.g. used programming language and tool versions
  • Project
  • Module
  • Package
  • File
  • Type
  • Function
  • Control flow (e.g. if a test executes only the if-branch of an if-statement, the else-branch is not a dependency)

All these levels are not isolated and can involve higher levels, e.g. an if-branch can involve a whole new project. The more granular the analysis of dependency levels, the more granularly individual changes can be mapped to single test cases. Additionally, the better the understanding of dependencies, the more tests can be ignored at execution, e.g. knowing that internal documentation changes are not connected to any code can bring such changes to a CI runtime of 0 seconds.

There are two approaches to analyze dependencies:

  • Static (not executing tests): Fast and very accurate but has a limit on granularity and cannot paint a perfect picture since some dependencies (such as the control flow) can only be resolved during runtime.
  • Dynamic (executing tests): Requires execution of individual tests either via modified tools or via symbolic execution. However, allows perfect granularity and connection of dependencies and tests.

Connecting dependencies to individual test cases

Individual test cases are the “outmost nodes” of a fully established dependency graph: they only have dependencies on others, but no node of the graph is dependent on them. With this, the general problem of “test impact analysis” is complete, meaning that one can always decide if some test case is affected by some change. At this point, if a dependency has been changed, the connection to all tests that must be rerun can be solved by traversing the graph.

Since dependency graphs can be enormous and can change with every keystroke dramatically, caching of all levels must be involved. Otherwise, it might take more time to calculate which tests must be executed than the actual execution of these tests.

Granularity of test execution

Now that all test cases have their dependency graph, and we can identify which tests must be executed for specific changes, there is only one problem left to solve: how to actually execute individual test cases. Not every testing tool allows to granularly execute individual test cases. However, there is always the solution to put all tests that should be run into their own program and execute them.

The next sections showcase a benchmark for an implementation of test impact analysis for Go repositories which applies a static analysis on the package level with an average reduction of 29% in test execution time.

Benchmarking test impact analysis for Go

To evaluate symflower test-runner on real-life projects, we searched GitHub for popular Go projects. We ran the test suites of the repositories both with and without the test runner for the 5 most recent merges.

If you want to try symflower test-runner on your own project, see the example below.

This table summarizes the results of the benchmark:

Repository With test runner Without test runner Time saved Time saved (%)
ebiten 1m26s 2m33s 1m7s 43.61%
tailscale 10m55s 17m39s 6m44s 38.14%
go-ethereum 13m7s 20m45s 7m37s 36.76%
gonum 12m20s 19m26s 7m5s 36.51%
hugo 22m5s 24m57s 2m52s 11.49%
dolt 33m7s 36m58s 3m51s 10.44%

In summary, even a simple static analysis on the package level resulted in an average 29% reduction in test time. Doing this benchmark gave us deep insights into how changes in a variety of projects are connected to their tests, detailed in the next subsections. With that information, clear next steps can be formulated to allow for even more time saved in subsequent iterations of symflower test-runner.

The next subsections go into details of the individual repositories.

Ebiten

In this repository, for the 1st, 2nd, and 4th revisions, only a subset of the packages were executed. In the 3rd revision, all tests needed to be executed because of a change in the go.mod file, resulting in a higher execution time. In the 5th revision, however, no tests needed to be run because only a Java file was modified.

Revision With test runner (s) Without test runner (s)
26feb2623754c17107005ac6376e2b4363a7be53 17 32
df821f01774638bf8cb78f387ad81fb0a4b5adcb 17 32
e058bb6fd323b4029a054af8a16007f31aee42d7 35 29
15dfb02f9fa7cf633a87332c60020485a1a2e231 15 31
a786f23e28e7f7f5eae8425d0d37656f84985e8f 3 30

Tailscale

symflower test-runner had a positive impact on test execution time in all revisions. In the 1st revision, only one package was affected by a change, resulting in a difference of approximately 221 seconds compared to running all tests. In the remaining revisions, only a subset of tests were executed, resulting in lower execution times.

Revision With test runner (s) Without test runner (s)
a228d77f8620cb6ba693e28cda48705c8418f7e7 19 229
0970615b1b455a94214abd61b5e79d05d6d8f9bf 145 191
0a2e5afb263ae58916871640d0ddc1549e2657e1 132 192
209567e7a0b939cbc3b2067cb5ef5f89305bd075 178 196
d6dfb7f242b91cac34f70cce654c0a61daf247ea 181 250

dolt

In the 3rd and 4th revisions, all tests needed to be executed because there were changes in the go.mod file. For all other revisions only a subset of the tests were executed.

Revision With test runner (s) Without test runner (s)
9450b292cecba77ad71222222ef81c773ba5b5f6 374 444
dec900cdd3e270106396f66bb131164380c3f8ec 350 422
d56b31f1c11c19e8f35e4eeb63ec9b5cf4b2f399 456 459
b9cd0fc7fe0feb3c87b81b60ef10efbcac5d45e7 435 451
bbeddb44fd85489779a9ccc8d0ced95a8d936eba 372 443

Go-ethereum

For every revision except the 4th one, a subset of packages were executed, resulting in lower execution times. In the 4th revision, no tests needed to be run because only a text file was changed.

Revision With test runner (s) Without test runner (s)
d71831255da7ecb3817c9e14c9142fe9d4441d3b 174 207
88c84590057acf0bde6fc7a2fb3ca9cf593a2318 104 200
8f4fac7b86227e3ceca095e60d58f126d692f379 228 275
83775b1dc7f59053ec69085e746c4dd9b9be3a0a 41 248
5035f99bce9bc23db27b68dd8c4a927f9d7d2ef6 240 316

Gonum

In each revision, test execution times decreased when using symflower test-runner. In the 1st revision, all tests were executed because the go.mod file was modified. In the 2nd, 4th, and 5th revisions, only a subset of tests were executed. In the 3rd revision, no tests were executed since only the AUTHORS and CONTRIBUTORS files of the repository were changed, resulting in a lower execution time.

Revision With test runner (s) Without test runner (s)
1ca563a018b641e805317f1ac9ae0d37b32d162c 181 208
bdcda9a453049449163d160b98285b64ec8093a1 109 228
a9b228ed6bdcfafd52ce8ba413595310823a0004 4 229
1f29d7b1d1724243c9f4a156cb1e16c9cbb15de1 219 262
f1a62e187e273b2d99f9c2a04fa8931df9c22947 227 239

Hugo

In all revisions but the 4th one in the table below, all tests were executed because a change in the go.mod file was detected. However, in the 4th revision, we see a huge difference in the execution time. The reason is that only changes in markdown files were detected, so no tests needed to be run, resulting in an approximately 223 second reduction in the execution time.

Revision With test runner (s) Without test runner (s)
0c453420e6fceccf36d06cea0a9e53ac6b8401ba 399 334
e99eba39e7f9f9fed454a7671635052600685cea 335 334
af0cb57aaf668278551454678eac60b17348b13c 291 279
e8cc785a589bb18c9336880d662a659f29bb57f3 12 235
b8d5090452ee482a4191622201f1548e651753f7 288 315

Run symflower test-runner for go test

Apply the following steps to run symflower test-runner for your own Go project:

  1. Install symflower
  2. Check that the symflower binary can be executed using symflower version
  3. Change the directory to your repository
  4. Run symflower test-runner --commit-from HEAD~ -- go test -v to run affected tests for the last commit
  5. Take a look at symflower test-runner --help for even more options

⚠️ Important note

To make sure every change is 100% tested in production code, we recommend at the moment to only use symflower test-runner for testing CI pipelines.

An example

As an example, let’s create a project that handles operations related to geometric shapes. Here’s our project structure:

shapes
├── area
│   ├── area.go
│   └── area_test.go
├── go.mod
├── go.sum
└── perimeter
    ├── perimeter.go
    └── perimeter_test.go

Function #1: the file area.go contains a function that calculates the area of a circle:

package area

import (
	"errors"
	"math"
)

func CircleArea(radius float64) (area float64, err error) {
	if radius <= 0 {
		return 0, errors.New("radius must be a positive value")
	}

	return math.Pi * radius * radius, nil
}

The file area_test.go contains the tests for the CircleArea function:

package area

import (
	"errors"
	"testing"

	"github.com/stretchr/testify/assert"
)

func TestCircleArea(t *testing.T) {
	type testCase struct {
		Name string

		Radius float64

		ExpectedArea float64
		ExpectedErr  error
	}

	validate := func(t *testing.T, tc *testCase) {
		t.Run(tc.Name, func(t *testing.T) {
			actualArea, actualErr := CircleArea(tc.Radius)

			assert.InDelta(t, actualArea, tc.ExpectedArea, 0.1)
			assert.Equal(t, tc.ExpectedErr, actualErr)
		})
	}

	validate(t, &testCase{
		Name: "Negative radius",

		Radius: -1,

		ExpectedErr: errors.New("radius must be a positive value"),
	})
	validate(t, &testCase{
		Name: "Positive radius",

		Radius: 5.0,

		ExpectedArea: 78.5,
	})
}

Function #2: The file perimeter.go contains a function that calculates the perimeter of a circle:

package perimeter

import (
	"errors"
	"math"
)

func CirclePerimeter(radius float64) (perimeter float64, err error) {
	if radius <= 0 {
		return 0, errors.New("radius must be a positive value")
	}

	return 2 * math.Pi * radius, nil
}

The file perimeter_test.go contains the tests for the above CirclePerimeter function:

package perimeter

import (
	"errors"
	"testing"

	"github.com/stretchr/testify/assert"
)

func TestCirclePerimeter(t *testing.T) {
	type testCase struct {
		Name string

		Radius float64

		ExpectedPerimeter float64
		ExpectedErr       error
	}

	validate := func(t *testing.T, tc *testCase) {
		t.Run(tc.Name, func(t *testing.T) {
			actualPerimeter, actualErr := CirclePerimeter(tc.Radius)

			assert.InDelta(t, actualPerimeter, tc.ExpectedPerimeter, 0.1)
			assert.Equal(t, tc.ExpectedErr, actualErr)
		})
	}

	validate(t, &testCase{
		Name: "Negative radius",

		Radius: -1,

		ExpectedErr: errors.New("radius must be a positive value"),
	})
	validate(t, &testCase{
		Name: "Positive radius",

		Radius: 5.0,

		ExpectedPerimeter: 31.4,
	})
}

Let’s initialize a Git repository within this project and commit all these files:

git init
git add .
git commit -m "Init"

We’re going to make some changes now by adding a function that calculates the area of a square, along with the corresponding tests. Let’s add the following code to area.go:

func SquareArea(side float64) (area float64, err error) {
	if side <= 0 {
		return 0, errors.New("side must be a positive value")
	}

	return side * side, nil
}

Let’s add the following tests to area_test.go:

func TestSquareArea(t *testing.T) {
	type testCase struct {
		Name string

		Side float64

		ExpectedArea float64
		ExpectedErr  error
	}

	validate := func(t *testing.T, tc *testCase) {
		t.Run(tc.Name, func(t *testing.T) {
			actualArea, actualErr := SquareArea(tc.Side)

			assert.InDelta(t, actualArea, tc.ExpectedArea, 0.1)
			assert.Equal(t, tc.ExpectedErr, actualErr)
		})
	}

	validate(t, &testCase{
		Name: "Negative side",

		Side: -1,

		ExpectedErr: errors.New("side must be a positive value"),
	})
	validate(t, &testCase{
		Name: "Positive side",

		Side: 5.0,

		ExpectedArea: 25.0,
	})
}

Running git status at this point, you should see the following modified files:

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   area/area.go
        modified:   area/area_test.go

no changes added to commit (use "git add" and/or "git commit -a")

We can now use symflower test-runner to run only the tests that are necessary for the changes we did. Let’s run the following command:

symflower test-runner -- go test -v

Here’s the output:

Detected changes:
 - "shapes/area/area.go"
 - "shapes/area/area_test.go"

Affected by change:
 - "shapes/area"
 - "shapes/area [shapes/area.test]"
 - "shapes/area.test"

Executing "go test -v shapes/area"
=== RUN   TestCircleArea
=== RUN   TestCircleArea/Negative_radius
=== RUN   TestCircleArea/Positive_radius
--- PASS: TestCircleArea (0.00s)
    --- PASS: TestCircleArea/Negative_radius (0.00s)
    --- PASS: TestCircleArea/Positive_radius (0.00s)
=== RUN   TestSquareArea
=== RUN   TestSquareArea/Negative_side
=== RUN   TestSquareArea/Positive_side
--- PASS: TestSquareArea (0.00s)
    --- PASS: TestSquareArea/Negative_side (0.00s)
    --- PASS: TestSquareArea/Positive_side (0.00s)
PASS
ok      shapes/area     0.003s

As you can see, the command outputs the detected changes (in our case, the changes we did to the area.go and area_test.go files), and the list of packages affected by those changes. Instead of running all the tests, the command decided to run only the tests from the shapes/area package, since they were the only ones affected by our changes. As mentioned above, you can also specify the commit from which the changes are taken into account. Let’s commit our last changes:

git add .
git commit -m "Square area"

If you now run symflower test-runner --commit-from HEAD~ -- go test -v, the command output is exactly the same as the one above.

The algorithm inside symflower test-runner does the following:

  • It performs git diff to retrieve the list of all files that have changed up to the specified commit revision (or HEAD if none were specified).
  • Each package in the project is analyzed to check if there is a direct or indirect package dependency containing a change found with the git diff command.
  • If so, the package name is collected so that its tests are executed. If a go.mod file is modified, the command runs all tests.

This approach already works great. However, it could be much more granular, but can also overlook changes that require executions of tests or even executions of all tests. For example, changing a non-Go file that is used in a test (i.e. reading a configuration file) requires the test to be executed. Most of the time, such artifacts cannot be determined by static analysis. Additionally, if configurations (e.g. tool versions) are changed, usually all tests should be executed. Currently, only a check for a “go.mod” file is implemented. These are just some examples of how to improve test impact analysis. We are addressing a range of scenarios with the next iteration, described in the section below.

Next steps for symflower test-runner

The next iterations of symflower test-runner will introduce the following features:

  • Static detection of dependencies based on the call graphs of code
  • Dynamic detection of dependencies based on symbolic execution
  • Automatic detection of irrelevant directories and files
  • Integration into VS Code and IntelliJ for test execution during typing
  • Support for Java and other languages

On leveraging our symbolic execution engine in further iterations of symflower test-runner: although performing test impact analysis with a symbolic execution is computationally intensive, it leads to more accurate results. As an example, let’s take a test that only checks a particular condition of a function. With symbolic execution, individual conditions can be identified as dependencies of a test. That would not be possible with coverage-based dependency analysis.

Get notified about future testing content from us: sign up for our newsletter and follow us on Twitter and LinkedIn.

| 2024-09-12