Test Intelligence • Testers
Shift Left for Faster Test Feedback with Smart Test Selection

In modern software development, testers face a constant dilemma: On the one hand, successful software grows. We rarely remove features, just adding code. So there’s more and more code to test with every release. On the other hand, release cycles are becoming ever shorter, so there’s less and less time to run our tests.

How can we test more and more software in less and less time?

»Shift Left« seems like the answer: we simply run our tests more frequently and earlier in the development lifecycle to detect newly introduced bugs much sooner and provide developers with rapid feedback.

However, our most valuable test suites, particularly end-to-end tests, have grown over many years. They take hours, days or even weeks to execute. Even when applying heavy parallelization or performance optimizations to speed them up. It seems infeasible to shift such a slow test suite left and run it more frequently.

Luckily, there is a well tried and tested solution: Test Selection.

Test Selection is a powerful solution to the challenge of long-running test suites. The fundamental idea is to run only a small, highly effective subset of your full test suite. We can design this subset to require only a fraction of the full suite’s runtime, e.g., minutes instead of hours. Enabling much more frequent execution.

The key to this approach is concentrating a significant amount of bug-finding power within this small subset of tests.

Test Selection

Note that the small subset might not reveal all of the bugs that the full test suite finds. Our goal is to reveal the majority of bugs within a much smaller runtime. Does that mean we miss crucial bugs? No! Because we run the small subset in addition to the infrequent runs of all tests that we’ve been doing all along. And since these new frequent test runs reveal most bugs much earlier, we gain much faster feedback on the majority of bugs. We may even run the small subset on branches where end-to-end tests previously could not be run economically, e.g., feature or developer branches. Thereby further accelerating the development cycle.

Smart Selection

There are several approaches to Test Selection. We thoroughly tested all of them, to find those that deliver good results and can easily be applied at large-scale and even in legacy industry projects. Teamscale, our Software Quality Solution, uses these approaches to help our customers regain fast feedback from tests in their development process.

Quality Gates vs. Change-Based Testing

In working closely with dozens of teams, we found two main use cases for Test Selection.

Quality Gates: In this use case, we generate a fixed set of tests that can be repeatedly executed over a longer period of time (weeks, months), providing a consistent quality gate. This type of selection allows us to protect valuable resources, such as:

  • Expensive test runs. If running the tests is costly, e.g., due to expensive hardware or massive parallelization, then the worst-case is a single bug that causes many tests to fail. These bugs waste entire test runs, since they hide other bugs, depriving us of most of the value of the expensive run. Fortunately, these types of bugs can be detected at much smaller cost with our quality gate. Only software versions that pass the quality gate go into expensive test runs. Saving test execution costs.
    Quality Gate
  • Integration branches. When buggy changes are merged to a branch shared by many developers, they easily keep entire teams from working effectively. To prevent this, we can use our quality gate. Only changes that pass the quality gate may be merged. Greatly increasing the stability of shared branches.
    Integration Branch

Change-based Testing: In this use case, we generate a different set of tests for every test run, specifically tailored to the particular code changes under test. This delivers a far more precise and accurate selection, especially for smaller changes like pull requests, developer branches, single features, or even uncommitted changes in a developer’s IDE. At the price of a more complex setup, since we also need to dynamically incorporate the change information.

Change-based Testing

Quality gates and change-based testing can be combined to optimize different stages of development. For example, we might use a quality gate before expensive full test runs, while simultaneously applying change-based testing on feature branches and pull requests to ensure that specific changes are thoroughly tested before integration. This dual strategy maximizes both comprehensive quality assurance and rapid feedback.

Build a Quality Gate with AI Test Clustering

From working with end-to-end test suites that have grown over many years, we learned that these test suites typically grow by copy-and-paste. Meaning: if you need a new test case, you copy and adjust an existing test case. Repeat that year over year and you get a test suite that contains lots of very similar tests.

A high-level bug typically makes several of these similar tests fail so we only need to run one of them to find the bug. Thus, we break up these redundant clusters of similar tests, to cover large portions of the software system’s functionality with only a small percentage of the tests; giving us a good chance to find bugs anywhere in the application.

This effectively transforms the Test Selection problem into a clustering problem: We cluster similar tests together and select the most dissimilar tests for execution. Technically, we represent tests as numerical vectors in a multi-dimensional space, where similar tests are positioned closely together. The selection strategy then involves iteratively picking test cases that are "furthest apart" from those already selected, thereby ensuring a diverse and non-redundant subset.

Vector SpaceThe power of this technique stems from its use of Large Language Models (LLMs) to generate these vectors, known as embedding vectors. We generate them directly from the source code of automated tests or the instructions of manual tests. These embeddings capture the semantic meaning of the tests, such that semantically similar tests result in similar vectors.

We can import the source code of the automated tests from your version control system and read manual test instructions from your Application Lifecycle Management (ALM) system. This allows us to generate a new quality gate with the push of a button, e.g., any time your test suite changes significantly or regularly after every development iteration.

AI Test Clustering

90 bugs_13 timeHow effective is the technique? Our experiments on open-source and industry systems show that this approach enables teams to find 90% of bugs in just 13% of the full test suite’s runtime. This enables you to turn a long-running and expensive test suite into a quality gate that you can quickly run before every merge: Simply instruct your test runner or manual testers to not run all your tests, but the specific subset calculated by Teamscale.

Test Change-based with Similarity Scoring

These numbers are already great, but we can do even better by considering the particular changes that should be tested. Unlike with quality gates, we then generate a unique set of tests specifically for each set of changes, giving us higher precision and accuracy.

From your version control system (VCS) we precisely identify which lines of code have changed in your software system, e.g., for a new feature or a bug fix. Then we find tests that match well to these changes, which is, essentially, a search problem: Given a set of changes (query), search for test cases that match the changes best (results). Much like Google indexes a billion web pages to find those most relevant to your search query, we index your tests to find those most relevant to your changes. There is years of search engine research showing us how to do this.

Similartiy Scoring

The first step is to index the content of your tests – be it the source code of automated tests from the VCS or the detailed instructions of manual tests from your ALM – into a searchable database. Subsequently, we generate a search query from the changes themselves, for instance, by extracting terms like "search", "login" or "user" if those functionalities were modified.

You input a pull request, issue tracker ticket or code change, and Teamscale provides a ranked list of test cases, with the most relevant ones at the top. You then define a specific test budget - for example, 15 minutes - and execute tests from the top of this prioritized list until the budget is met.

90 bugs_4 timeOur experiments on open-source and industry systems show that this approach enables teams to find 90% of bugs in just 4% of the full test suite’s runtime. This level of efficiency makes it feasible to transform even a once-a-week test run into fast, per-pull-request checks, ensuring thorough testing very early in the development lifecycle. The only downside, compared to a quality gate, is the higher effort for setting this up, as your test runner or manual testers need to query for the relevant tests at the start of each test run.

Test Selection: Optimizing Your Testing Strategy

The imperative to "shift left" in software development requires smarter testing strategies to cope with growing codebases and shortening release cycles. Test Selection enables you to run smaller, yet highly effective, subsets of your tests more frequently, thereby accelerating feedback on most bugs. Teamscale provides Test Selection approaches that significantly speed up feedback from your long-running test suites:

  • It uses Large Language Models to convert test code into semantic embeddings, allowing it to group similar tests and then select a diverse, non-redundant subset. This approach allows you to find 90% of bugs in just 13% of your full test suite’s runtime. Ideal for protecting a shared branch or avoiding unnecessary costly test runs through a quality gate.

  • By analyzing specific code changes and indexing test content, it dynamically finds the most relevant tests for each change. This targeted approach is exceptionally efficient, allowing you to find 90% of bugs in as little as 4% of your full test suite’s runtime, making it perfect for rapid feedback on pull requests or feature branches.

Depending on how much pain your slow test suites cause you, the one or the other approach is the right fit for you. Or you opt for a strategic combination of both approaches to optimize different stages of the development and testing process. In any case, Test Selection effectively enables you to »Shift Left« the detection of most bugs in the development process, ensuring faster feedback and higher software quality without increasing testing costs.

Precision and Effort

 

Are you interested in our Smart Test Selection approaches? Reach out or watch the recording of our workshop on the topic.