Containing the Findings Flood

Have you ever used static analysis tools? Then you probably know how they tend to flood you with so many analysis results (findings), that you don’t know where to even start reading and resolving them. Instead of turning these tools off again, read on for some strategies for dealing with the findings flood.

First things first

To establish a common understanding, for me static analysis tools are all tools that point at potential problems in your source code without actually executing the code. These range from simple violations of formatting rules (“You should have placed this brace in the next line”) to potential bugs in the code (“If this code is executed, this will cause a null pointer dereference”). As a common term for these problems, I use the term finding in the remainder of this post.

Probably you will have seen such findings even if you are not aware of static analysis, as all warnings your compiler emits are the result of static analysis. However, there are many more specialized tools, both as Open Source and commercial software. Well known examples are Findbugs (Java), PMD (Java), FxCop (C#), StyleCop (C#), PC-Lint (C/C++), Goanna (C/C++), Pylint (Python), and so on. If you never used such a tool, I urge you to try one on your code base.

The usual result, when applying static analysis tools on a large code base, are tens of thousands of findings. I’ve also seen cases, where the number of findings was close to one million. While many of these findings are interesting, there is no way to just take this list and address these findings one by one. As not using these tools would dispose you of a valuable tool for locating problems and bugs in your code, you have to develop strategies for dealing with the masses of results.

Configure your tools correctly

The first suggestion might seem trivial, but is often a huge step forward: Configure your tool correctly. Most tools bring lots of rules that might not be right for every code base. If you do not care about comments in your code, you do not need a tool that gives a warning for every missing interface comment. While typically analysis tools have a standard rule set that is pretty conservative, there is no way it could match every coding style. So just disable or filter all rules that do not apply to you. A good starting point is to have a look at the rules with the most findings. Often these hint at things that you are doing fundamentally different from what the tool expects.

A special case of this strategy are formatting rules. Often it is easier to correct them by implementing consistent auto-formatting in your development environment or perform this at commit-time, rather than addressing this analytically. Not only because these rules tend to produce lots of findings, but also because they are trivial to fix automatically, so it is not worth developer time to fix manually. In most cases, I strongly suggest to disable all rules that only deal with formatting.

Discriminate your code

Usually, there are some places in your code where you care more about quality than in other parts (also depending on the rules). The most obvious part often is generated code. So the analysis finds your generated parser to be ugly formatted and not well documented? Tell me something I don’t know already. Typically, generated code can be excluded right from the start. But also in your hand-written code there are often different flavors. A problem in the central job scheduler or the accounting core might be more pressing than a finding in that obscure feature that is only used by one customer. Often, such a criticality of the code can be easily defined based on the architecture or the namespaces of your system.

Prioritize on criticality

This strategy is also not too surprising. Most tools assign severities or priorities to the rules they apply. Probably you care more about the possible ressource leak than about the local variable that is never read. So it is a good idea to focus on those findings that belong to high severity rules. But, just as with rule enablement, the default severity might not be right for every project. A possible SQL injection would be counted as serious in most cases, but if you are developing a system whose input data is checked by some other mechanism, you might be less interested in such findings.

Prioritize on insertion time

If you have many findings, you probably lived with many of them for a long time already and they were no problem so far, as all your tests pass and the code might even be in production. Instead, fixing the findings, might introduce new ones. So if the findings have been there for some time or are in code you do not expect to touch in the near future, let them rest in peace. Of course, this does not apply to the super critical possible race condition that the tool pointed you to, but for the majority of findings, only the newest ones are relevant. A common strategy is to only deal with findings that were newly inserted during the current development iteration or since the last release. One benefit of this strategy is that for these findings there is a developer still familiar with the code and the code has to be tested regardless of whether the finding is removed or not.

Semantic prioritization

There are cases, where selecting only on the rule type, the code location, and the insertion time is not enough. Maybe there is one rule, for which you only want to see very bad findings or findings that are easy to resolve. To be able to do this, your filter criterion has to actually understand the specific problem in the code (not just the idea behind the rule). My colleague Daniela Steidl is working on this kind of filtering and prioritization, but this is still early research and not yet supported by any tools that are available.

Putting it all together

When dealing with huge numbers of findings, you will not get away with only a single answer. Instead, you need a combination of the suggested strategies. For example, you might want to get rid of all findings of high severity, for all other findings you only care about the new ones in your non-experimental code. While many static analysis tools support some of those strategies, their focus is typically on analysis and hence more complex filtering of findings is beyond their scope. Also, when using multiple tools, which can be worthwhile as tools often complement each other, you might want a consistent overview and filtering of findings instead of separate lists for each tool.

A common setup is to use quality management tools such as ConQAT, SonarQube, or Teamscale as an aggregator and filter for other static analysis tools. This way, the static analysis tools can focus on the analysis task, while the quality management tool provides a common interface to the results and enables all the strategies mentioned before.

What is your experience with static analysis tools? How do you deal with tons of results? Let us know!