Test impact analysis identifies tests affected by code changes to save time on test execution. Even a basic implementation leads to a 29% reduction in test execution times on average, showcased here with a benchmark of the drop-in test command symflower test-runner
. In this article, you learn about common approaches and how they impact your daily software development process.
Sneak peek: benchmarking top Go repositories shows dramatic reductions in test execution time, no matter the size of the repository and change:
Repository | With test runner | Without test runner | Time saved | Time saved (%) |
---|---|---|---|---|
Ebitengine | 1m26s | 2m33s | 1m7s | 43.61% |
Tailscale | 10m55s | 17m39s | 6m44s | 38.14% |
go-ethereum | 13m7s | 20m45s | 7m37s | 36.76% |
Gonum | 12m20s | 19m26s | 7m5s | 36.51% |
Hugo | 22m5s | 24m57s | 2m52s | 11.49% |
Dolt | 33m7s | 36m58s | 3m51s | 10.44% |
In the following sections, we provide a detailed analysis of benchmark results, showcase a practical implementation through a running example, and outline next steps to improve its algorithms.
Table of contents:
- Test impact analysis matters
- Levels of impact analysis
- Benchmarking test impact analysis for Go
- Run
symflower test-runner
forgo test
- Next steps for
symflower test-runner
Test impact analysis matters
The current standard for test execution during development and especially during CI (Continuous Integration) is to execute all tests, regardless of what has changed. Even a quick typo fix in an internal documentation (irrelevant to the user-facing product) leads to minutes and hours of CI time, unnecessary boot-ups of multiple testing deployments, blocked resources, and laborious manual processes. All of that just to confirm that nothing at all has changed in the behavior of the end result. A wasteful use of resources and developer time, but accepted by everyone because it is automated.
Now imagine a perfect scenario where some automation identifies that the internal documentation fix for a typo does not require any testing time, checks, or deployments. It might not even require any manual processes besides a quick glance from a reviewer. A change that took hours to publish now takes seconds. That is the purpose of test impact analysis.
Identifying the checks and tests that are necessary to perform enables:
- Cutting execution time by running only affected checks and tests
- Saving resources and infrastructure costs by using them only when needed
- Accelerating software development and the earlier identification of bugs while coding
Automated test impact analysis fundamentally changes how developers write code. Remember, there used to be days when continuous syntax checking did not exist in editors. With test impact analysis, test suites are finally not as costly to execute and can be run continuously in the background, while coding. This leads to early insights, allowing developers to reuse their current context and stay in the flow. It can be also used to automatically fix simple mistakes right away, similar to how a grammar checker fixes typos while you type.
However, implementing test impact analysis comes with its own challenges, mainly stemming from the various levels of granularity that need to be considered. Which are discussed in the next section.
Levels of impact analysis
Test impact analysis involves three challenges:
- Granularity of dependencies
- Connecting dependencies to individual test cases
- Granularity of test execution
Advancing towards more granularity needs to be a balanced process: it doesn’t matter that a dependency graph is granular down to the characters of a file if you cannot identify and execute individual test cases.
Granularity of dependencies
With test impact analysis, the basic premise is to walk a dependency graph for each individual test. A test must be executed again even if just one of its dependencies has changed. Dependency in this case refers to relationships in the application structure, e.g. package A
of project X
may be using package B
that’s part of project Y
.
Multiple levels of dependencies need to be considered:
- Environment e.g. used programming language and tool versions
- Project
- Module
- Package
- File
- Type
- Function
- Control flow (e.g. if a test executes only the if-branch of an if-statement, the else-branch is not a dependency)
All these levels are not isolated and can involve higher levels, e.g. an if-branch can involve a whole new project. The more granular the analysis of dependency levels, the more granularly individual changes can be mapped to single test cases. Additionally, the better the understanding of dependencies, the more tests can be ignored at execution, e.g. knowing that internal documentation changes are not connected to any code can bring such changes to a CI runtime of 0 seconds.
There are two approaches to analyze dependencies:
- Static (not executing tests): Fast and very accurate but has a limit on granularity and cannot paint a perfect picture since some dependencies (such as the control flow) can only be resolved during runtime.
- Dynamic (executing tests): Requires execution of individual tests either via modified tools or via symbolic execution. However, allows perfect granularity and connection of dependencies and tests.
Connecting dependencies to individual test cases
Individual test cases are the “outmost nodes” of a fully established dependency graph: they only have dependencies on others, but no node of the graph is dependent on them. With this, the general problem of “test impact analysis” is complete, meaning that one can always decide if some test case is affected by some change. At this point, if a dependency has been changed, the connection to all tests that must be rerun can be solved by traversing the graph.
Since dependency graphs can be enormous and can change with every keystroke dramatically, caching of all levels must be involved. Otherwise, it might take more time to calculate which tests must be executed than the actual execution of these tests.
Granularity of test execution
Now that all test cases have their dependency graph, and we can identify which tests must be executed for specific changes, there is only one problem left to solve: how to actually execute individual test cases. Not every testing tool allows to granularly execute individual test cases. However, there is always the solution to put all tests that should be run into their own program and execute them.
The next sections showcase a benchmark for an implementation of test impact analysis for Go repositories which applies a static analysis on the package level with an average reduction of 29% in test execution time.
Benchmarking test impact analysis for Go
To evaluate symflower test-runner
on real-life projects, we searched GitHub for popular Go projects. We ran the test suites of the repositories both with and without the test runner for the 5 most recent merges.
If you want to try symflower test-runner
on your own project, see the example below.
This table summarizes the results of the benchmark:
Repository | With test runner | Without test runner | Time saved | Time saved (%) |
---|---|---|---|---|
Ebitengine | 1m26s | 2m33s | 1m7s | 43.61% |
Tailscale | 10m55s | 17m39s | 6m44s | 38.14% |
go-ethereum | 13m7s | 20m45s | 7m37s | 36.76% |
Gonum | 12m20s | 19m26s | 7m5s | 36.51% |
Hugo | 22m5s | 24m57s | 2m52s | 11.49% |
Dolt | 33m7s | 36m58s | 3m51s | 10.44% |
In summary, even a simple static analysis on the package level resulted in an average 29% reduction in test time. Doing this benchmark gave us deep insights into how changes in a variety of projects are connected to their tests, detailed in the next subsections. With that information, clear next steps can be formulated to allow for even more time saved in subsequent iterations of symflower test-runner
.
The next subsections go into details of the individual repositories.
Ebitengine
In this repository, for the 1st, 2nd, and 4th revisions, only a subset of the packages were executed. In the 3rd revision, all tests needed to be executed because of a change in the go.mod
file, resulting in a higher execution time. In the 5th revision, however, no tests needed to be run because only a Java file was modified.
Revision | With test runner (s) | Without test runner (s) |
---|---|---|
26feb2623754c17107005ac6376e2b4363a7be53 | 17 | 32 |
df821f01774638bf8cb78f387ad81fb0a4b5adcb | 17 | 32 |
e058bb6fd323b4029a054af8a16007f31aee42d7 | 35 | 29 |
15dfb02f9fa7cf633a87332c60020485a1a2e231 | 15 | 31 |
a786f23e28e7f7f5eae8425d0d37656f84985e8f | 3 | 30 |
Tailscale
symflower test-runner
had a positive impact on test execution time in all revisions. In the 1st revision, only one package was affected by a change, resulting in a difference of approximately 221 seconds compared to running all tests. In the remaining revisions, only a subset of tests were executed, resulting in lower execution times.
Revision | With test runner (s) | Without test runner (s) |
---|---|---|
a228d77f8620cb6ba693e28cda48705c8418f7e7 | 19 | 229 |
0970615b1b455a94214abd61b5e79d05d6d8f9bf | 145 | 191 |
0a2e5afb263ae58916871640d0ddc1549e2657e1 | 132 | 192 |
209567e7a0b939cbc3b2067cb5ef5f89305bd075 | 178 | 196 |
d6dfb7f242b91cac34f70cce654c0a61daf247ea | 181 | 250 |
go-ethereum
For every revision except the 4th one, a subset of packages were executed, resulting in lower execution times. In the 4th revision, no tests needed to be run because only a text file was changed.
Revision | With test runner (s) | Without test runner (s) |
---|---|---|
d71831255da7ecb3817c9e14c9142fe9d4441d3b | 174 | 207 |
88c84590057acf0bde6fc7a2fb3ca9cf593a2318 | 104 | 200 |
8f4fac7b86227e3ceca095e60d58f126d692f379 | 228 | 275 |
83775b1dc7f59053ec69085e746c4dd9b9be3a0a | 41 | 248 |
5035f99bce9bc23db27b68dd8c4a927f9d7d2ef6 | 240 | 316 |
Gonum
In each revision, test execution times decreased when using symflower test-runner
. In the 1st revision, all tests were executed because the go.mod
file was modified. In the 2nd, 4th, and 5th revisions, only a subset of tests were executed. In the 3rd revision, no tests were executed since only the AUTHORS
and CONTRIBUTORS
files of the repository were changed, resulting in a lower execution time.
Revision | With test runner (s) | Without test runner (s) |
---|---|---|
1ca563a018b641e805317f1ac9ae0d37b32d162c | 181 | 208 |
bdcda9a453049449163d160b98285b64ec8093a1 | 109 | 228 |
a9b228ed6bdcfafd52ce8ba413595310823a0004 | 4 | 229 |
1f29d7b1d1724243c9f4a156cb1e16c9cbb15de1 | 219 | 262 |
f1a62e187e273b2d99f9c2a04fa8931df9c22947 | 227 | 239 |
Hugo
In all revisions but the 4th one in the table below, all tests were executed because a change in the go.mod
file was detected. However, in the 4th revision, we see a huge difference in the execution time. The reason is that only changes in markdown files were detected, so no tests needed to be run, resulting in an approximately 223 second reduction in the execution time.
Revision | With test runner (s) | Without test runner (s) |
---|---|---|
0c453420e6fceccf36d06cea0a9e53ac6b8401ba | 399 | 334 |
e99eba39e7f9f9fed454a7671635052600685cea | 335 | 334 |
af0cb57aaf668278551454678eac60b17348b13c | 291 | 279 |
e8cc785a589bb18c9336880d662a659f29bb57f3 | 12 | 235 |
b8d5090452ee482a4191622201f1548e651753f7 | 288 | 315 |
Dolt
In the 3rd and 4th revisions, all tests needed to be executed because there were changes in the go.mod
file. For all other revisions only a subset of the tests were executed.
Revision | With test runner (s) | Without test runner (s) |
---|---|---|
9450b292cecba77ad71222222ef81c773ba5b5f6 | 374 | 444 |
dec900cdd3e270106396f66bb131164380c3f8ec | 350 | 422 |
d56b31f1c11c19e8f35e4eeb63ec9b5cf4b2f399 | 456 | 459 |
b9cd0fc7fe0feb3c87b81b60ef10efbcac5d45e7 | 435 | 451 |
bbeddb44fd85489779a9ccc8d0ced95a8d936eba | 372 | 443 |
Run symflower test-runner
for go test
Apply the following steps to run symflower test-runner
for your own Go project:
- Install symflower
- Check that the
symflower
binary can be executed usingsymflower version
- Change the directory to your repository
- Run
symflower test-runner --commit-from HEAD~ -- go test -v
to run affected tests for the last commit - Take a look at
symflower test-runner --help
for even more options
⚠️ Important note
To make sure every change is 100% tested in production code, we recommend at the moment to only use symflower test-runner
for testing CI pipelines.
An example
As an example, let’s create a project that handles operations related to geometric shapes. Here’s our project structure:
shapes
├── area
│ ├── area.go
│ └── area_test.go
├── go.mod
├── go.sum
└── perimeter
├── perimeter.go
└── perimeter_test.go
Function #1: the file area.go
contains a function that calculates the area of a circle:
package area
import (
"errors"
"math"
)
func CircleArea(radius float64) (area float64, err error) {
if radius <= 0 {
return 0, errors.New("radius must be a positive value")
}
return math.Pi * radius * radius, nil
}
The file area_test.go
contains the tests for the CircleArea
function:
package area
import (
"errors"
"testing"
"github.com/stretchr/testify/assert"
)
func TestCircleArea(t *testing.T) {
type testCase struct {
Name string
Radius float64
ExpectedArea float64
ExpectedErr error
}
validate := func(t *testing.T, tc *testCase) {
t.Run(tc.Name, func(t *testing.T) {
actualArea, actualErr := CircleArea(tc.Radius)
assert.InDelta(t, actualArea, tc.ExpectedArea, 0.1)
assert.Equal(t, tc.ExpectedErr, actualErr)
})
}
validate(t, &testCase{
Name: "Negative radius",
Radius: -1,
ExpectedErr: errors.New("radius must be a positive value"),
})
validate(t, &testCase{
Name: "Positive radius",
Radius: 5.0,
ExpectedArea: 78.5,
})
}
Function #2: The file perimeter.go
contains a function that calculates the perimeter of a circle:
package perimeter
import (
"errors"
"math"
)
func CirclePerimeter(radius float64) (perimeter float64, err error) {
if radius <= 0 {
return 0, errors.New("radius must be a positive value")
}
return 2 * math.Pi * radius, nil
}
The file perimeter_test.go
contains the tests for the above CirclePerimeter
function:
package perimeter
import (
"errors"
"testing"
"github.com/stretchr/testify/assert"
)
func TestCirclePerimeter(t *testing.T) {
type testCase struct {
Name string
Radius float64
ExpectedPerimeter float64
ExpectedErr error
}
validate := func(t *testing.T, tc *testCase) {
t.Run(tc.Name, func(t *testing.T) {
actualPerimeter, actualErr := CirclePerimeter(tc.Radius)
assert.InDelta(t, actualPerimeter, tc.ExpectedPerimeter, 0.1)
assert.Equal(t, tc.ExpectedErr, actualErr)
})
}
validate(t, &testCase{
Name: "Negative radius",
Radius: -1,
ExpectedErr: errors.New("radius must be a positive value"),
})
validate(t, &testCase{
Name: "Positive radius",
Radius: 5.0,
ExpectedPerimeter: 31.4,
})
}
Let’s initialize a Git repository within this project and commit all these files:
git init
git add .
git commit -m "Init"
We’re going to make some changes now by adding a function that calculates the area of a square, along with the corresponding tests. Let’s add the following code to area.go
:
func SquareArea(side float64) (area float64, err error) {
if side <= 0 {
return 0, errors.New("side must be a positive value")
}
return side * side, nil
}
Let’s add the following tests to area_test.go
:
func TestSquareArea(t *testing.T) {
type testCase struct {
Name string
Side float64
ExpectedArea float64
ExpectedErr error
}
validate := func(t *testing.T, tc *testCase) {
t.Run(tc.Name, func(t *testing.T) {
actualArea, actualErr := SquareArea(tc.Side)
assert.InDelta(t, actualArea, tc.ExpectedArea, 0.1)
assert.Equal(t, tc.ExpectedErr, actualErr)
})
}
validate(t, &testCase{
Name: "Negative side",
Side: -1,
ExpectedErr: errors.New("side must be a positive value"),
})
validate(t, &testCase{
Name: "Positive side",
Side: 5.0,
ExpectedArea: 25.0,
})
}
Running git status
at this point, you should see the following modified files:
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: area/area.go
modified: area/area_test.go
no changes added to commit (use "git add" and/or "git commit -a")
We can now use symflower test-runner
to run only the tests that are necessary for the changes we did. Let’s run the following command:
symflower test-runner -- go test -v
Here’s the output:
Detected changes:
- "shapes/area/area.go"
- "shapes/area/area_test.go"
Affected by change:
- "shapes/area"
- "shapes/area [shapes/area.test]"
- "shapes/area.test"
Executing "go test -v shapes/area"
=== RUN TestCircleArea
=== RUN TestCircleArea/Negative_radius
=== RUN TestCircleArea/Positive_radius
--- PASS: TestCircleArea (0.00s)
--- PASS: TestCircleArea/Negative_radius (0.00s)
--- PASS: TestCircleArea/Positive_radius (0.00s)
=== RUN TestSquareArea
=== RUN TestSquareArea/Negative_side
=== RUN TestSquareArea/Positive_side
--- PASS: TestSquareArea (0.00s)
--- PASS: TestSquareArea/Negative_side (0.00s)
--- PASS: TestSquareArea/Positive_side (0.00s)
PASS
ok shapes/area 0.003s
As you can see, the command outputs the detected changes (in our case, the changes we did to the area.go
and area_test.go
files), and the list of packages affected by those changes.
Instead of running all the tests, the command decided to run only the tests from the shapes/area
package, since they were the only ones affected by our changes. As mentioned above, you can also specify the commit from which the changes are taken into account. Let’s commit our last changes:
git add .
git commit -m "Square area"
If you now run symflower test-runner --commit-from HEAD~ -- go test -v
, the command output is exactly the same as the one above.
The algorithm inside symflower test-runner
does the following:
- It performs
git diff
to retrieve the list of all files that have changed up to the specified commit revision (orHEAD
if none were specified). - Each package in the project is analyzed to check if there is a direct or indirect package dependency containing a change found with the
git diff
command. - If so, the package name is collected so that its tests are executed. If a
go.mod
file is modified, the command runs all tests.
This approach already works great. However, it could be much more granular, but can also overlook changes that require executions of tests or even executions of all tests. For example, changing a non-Go file that is used in a test (i.e. reading a configuration file) requires the test to be executed. Most of the time, such artifacts cannot be determined by static analysis. Additionally, if configurations (e.g. tool versions) are changed, usually all tests should be executed. Currently, only a check for a “go.mod” file is implemented. These are just some examples of how to improve test impact analysis. We are addressing a range of scenarios with the next iteration, described in the section below.
Next steps for symflower test-runner
The next iterations of symflower test-runner
will introduce the following features:
- Static detection of dependencies based on the call graphs of code
- Dynamic detection of dependencies based on symbolic execution
- Automatic detection of irrelevant directories and files
- Integration into VS Code and IntelliJ for test execution during typing
- Support for Java and other languages
On leveraging our symbolic execution engine in further iterations of symflower test-runner
: although performing test impact analysis with a symbolic execution is computationally intensive, it leads to more accurate results. As an example, let’s take a test that only checks a particular condition of a function. With symbolic execution, individual conditions can be identified as dependencies of a test. That would not be possible with coverage-based dependency analysis.
Get notified about future testing content from us: sign up for our newsletter and follow us on Twitter and LinkedIn.