Events

Dear AI, Which Tests should Robot Framework Execute Now?

Written by Dr. Elmar Jürgens | Dec 13, 2024 11:38:29 am

The more tests we have, the longer it takes to execute them all. This increases feedback times between when a new bug gets introduced and when a Robot Framework tests reveals it. In consequence, debugging gets more painful and costly. And lets be honest, it also takes a lot of the fun out of test automation, as recipients of late test failures are often unhappy about the news and tend to vent on the messenger. It is better for all involved, when test results are delivered quickly.

In theory, there is a simple approach to fix this: Don’t always execute all tests. Instead, select a small subset of tests that runs much faster. Then execute this subset more frequently.

This is a good idea if this small subset find a large percentage of the bugs in a small fraction of the time. For example, if we can find 80% of the bugs (that executing all tests would discover) in 5% of the time (it would take to execute all tests) then we could improve feedback times massively for most bugs. In practice, however, this idea obviously hinges on how well we manage to select those tests.

We have spent the last decade working on this problem, both in research (through master thesis and PhD thesis projects) and in practice. We started out with approaches that do not use AI: For example, test impact analysis uses test-case specific code coverage and greedy optimization algorithms to select those tests that cover a given set of code changes most quickly. In recent years, we have included approaches that use AI to tackle this question. For example, predictive test selection learns from code changes and past test failures to predict which tests can spot new bugs in new code changes. Other approaches use information retrieval or distances between LLM embeddings of test cases to suggest test cases without requiring code coverage information. Finally, defect prediction approaches go one step further, and predict where in the code base bugs are most likely to occur.

Our team has implemented all of these approaches. We have tried them in our own development and test projects. We have also applied them in customer contexts. In this talk, I will share our failures and successes and outline how they can applied when using Robot Framework.

I will also give a checklist of which approaches work best in which contexts -- often you can really find 80% of the bugs in 5% or less of the time. But I will also reveal which approaches should be avoided at all costs, even when they are really shiny, because they do not work at all.