In modern software development, testers face a constant dilemma: On the one hand, successful software grows. We rarely remove features, just adding code. So there’s more and more code to test with every release. On the other hand, release cycles are becoming ever shorter, so there’s less and less time to run our tests.
»Shift Left« seems like the answer: we simply run our tests more frequently and earlier in the development lifecycle to detect newly introduced bugs much sooner and provide developers with rapid feedback.
However, our most valuable test suites, particularly end-to-end tests, have grown over many years. They take hours, days or even weeks to execute. Even when applying heavy parallelization or performance optimizations to speed them up. It seems infeasible to shift such a slow test suite left and run it more frequently.
Luckily, there is a well tried and tested solution: Test Selection.
Test Selection is a powerful solution to the challenge of long-running test suites. The fundamental idea is to run only a small, highly effective subset of your full test suite. We can design this subset to require only a fraction of the full suite’s runtime, e.g., minutes instead of hours. Enabling much more frequent execution.
The key to this approach is concentrating a significant amount of bug-finding power within this small subset of tests.
Note that the small subset might not reveal all of the bugs that the full test suite finds. Our goal is to reveal the majority of bugs within a much smaller runtime. Does that mean we miss crucial bugs? No! Because we run the small subset in addition to the infrequent runs of all tests that we’ve been doing all along. And since these new frequent test runs reveal most bugs much earlier, we gain much faster feedback on the majority of bugs. We may even run the small subset on branches where end-to-end tests previously could not be run economically, e.g., feature or developer branches. Thereby further accelerating the development cycle.
There are several approaches to Test Selection. We thoroughly tested all of them, to find those that deliver good results and can easily be applied at large-scale and even in legacy industry projects. Teamscale, our Software Quality Solution, uses these approaches to help our customers regain fast feedback from tests in their development process.
In working closely with dozens of teams, we found two main use cases for Test Selection.
Quality Gates: In this use case, we generate a fixed set of tests that can be repeatedly executed over a longer period of time (weeks, months), providing a consistent quality gate. This type of selection allows us to protect valuable resources, such as:
Change-based Testing: In this use case, we generate a different set of tests for every test run, specifically tailored to the particular code changes under test. This delivers a far more precise and accurate selection, especially for smaller changes like pull requests, developer branches, single features, or even uncommitted changes in a developer’s IDE. At the price of a more complex setup, since we also need to dynamically incorporate the change information.
Quality gates and change-based testing can be combined to optimize different stages of development. For example, we might use a quality gate before expensive full test runs, while simultaneously applying change-based testing on feature branches and pull requests to ensure that specific changes are thoroughly tested before integration. This dual strategy maximizes both comprehensive quality assurance and rapid feedback.
From working with end-to-end test suites that have grown over many years, we learned that these test suites typically grow by copy-and-paste. Meaning: if you need a new test case, you copy and adjust an existing test case. Repeat that year over year and you get a test suite that contains lots of very similar tests.
A high-level bug typically makes several of these similar tests fail so we only need to run one of them to find the bug. Thus, we break up these redundant clusters of similar tests, to cover large portions of the software system’s functionality with only a small percentage of the tests; giving us a good chance to find bugs anywhere in the application.
This effectively transforms the Test Selection problem into a clustering problem: We cluster similar tests together and select the most dissimilar tests for execution. Technically, we represent tests as numerical vectors in a multi-dimensional space, where similar tests are positioned closely together. The selection strategy then involves iteratively picking test cases that are "furthest apart" from those already selected, thereby ensuring a diverse and non-redundant subset.
We can import the source code of the automated tests from your version control system and read manual test instructions from your Application Lifecycle Management (ALM) system. This allows us to generate a new quality gate with the push of a button, e.g., any time your test suite changes significantly or regularly after every development iteration.
These numbers are already great, but we can do even better by considering the particular changes that should be tested. Unlike with quality gates, we then generate a unique set of tests specifically for each set of changes, giving us higher precision and accuracy.
From your version control system (VCS) we precisely identify which lines of code have changed in your software system, e.g., for a new feature or a bug fix. Then we find tests that match well to these changes, which is, essentially, a search problem: Given a set of changes (query), search for test cases that match the changes best (results). Much like Google indexes a billion web pages to find those most relevant to your search query, we index your tests to find those most relevant to your changes. There is years of search engine research showing us how to do this.
The first step is to index the content of your tests – be it the source code of automated tests from the VCS or the detailed instructions of manual tests from your ALM – into a searchable database. Subsequently, we generate a search query from the changes themselves, for instance, by extracting terms like "search", "login" or "user" if those functionalities were modified.
You input a pull request, issue tracker ticket or code change, and Teamscale provides a ranked list of test cases, with the most relevant ones at the top. You then define a specific test budget - for example, 15 minutes - and execute tests from the top of this prioritized list until the budget is met.
The imperative to "shift left" in software development requires smarter testing strategies to cope with growing codebases and shortening release cycles. Test Selection enables you to run smaller, yet highly effective, subsets of your tests more frequently, thereby accelerating feedback on most bugs. Teamscale provides Test Selection approaches that significantly speed up feedback from your long-running test suites:
Depending on how much pain your slow test suites cause you, the one or the other approach is the right fit for you. Or you opt for a strategic combination of both approaches to optimize different stages of the development and testing process. In any case, Test Selection effectively enables you to »Shift Left« the detection of most bugs in the development process, ensuring faster feedback and higher software quality without increasing testing costs.
Are you interested in our Smart Test Selection approaches? Reach out or watch the recording of our workshop on the topic.