In many software development projects, code metrics are used to grasp the concept of technical software quality, put it into numbers, and, hence, make it measurable and transparent. While installing a tool and receiving a set of numeric values is quite simple, deriving useful quality-improving actions from it, is not. For us at CQSE, it all starts first and foremost with defining the analysis scope—something we call the art of code discrimination. Whenever we use any sort of metric to gain insights about a software system, we devote significant resources to get this right. Only a cleanly defined analysis scope will allow you to get undistorted metric results. It sounds like a very trivial thing to know and to do. Yet, it is so often omitted in practice.
So which code should you include when analyzing software quality?
Let’s start with a simple measuring task: How many lines of code does your system comprise? If you do not want to rely on prominent gut feelings, let’s install a tool, let it do the counting and get a result. But is the result the answer? Maybe. Depends on what you counted. Did you include test code? Should it be included? Did you exclude generated code? Should it be excluded? Oh, and what about that one component that was copied and pasted for some experiments—which, unfortunately, did not work out in the end? Counting lines of code is a very simple metric. Yet, the scope of the analyzed code has substantial impact. It does so even more for more complex metrics, such as clone coverage, for example, or comment completeness.
To analyze software quality, we care about code which is manually maintained. This comprises the application code—the actual code running in production—as well as the test code. We analyze both, however, separately, because many of our customers have different quality expectations on their test code. While application and test code is included in the analysis scope, not manually maintained code is excluded. The most prominent example is generated code if it is continuously re-generated by a tool. Maintainability aspects of generated code do neither impact a developer nor the maintainability of a system. If a quality analysis of this part of the system should be carried out, the input of the generator is to be analyzed rather than its output.
If the code discrimination of generated code is not done properly, it is not surprising to end up with, for example, a clone coverage of 50% and above, while the clone coverage of the actual application code may be only about 7% or 8%. However, excluding generated code completely from the analysis is a somewhat defensive approach: There are certain quality aspects such as potential null pointer dereferences that you would like to be aware of, even in generated code, because they affect the correct behavior of the system rather than its maintainability. To avoid information overload, it is a safe bet to start analyzing the non-generated application code first. After this process is established successfully, you may also consider the quality criteria relevant for your generated code.
And then, in addition to application, test, and generated code, there is a whole lot of other code that should be excluded from the analysis scope, too. Very often, when developers are confronted with a finding, they respond with »Yeah, but we do not change this code.« Which is, in fact, true for third-party code or library code that has not been patched. As the developers are neither responsible for the quality of third-party code nor affected by its maintainability, we exclude it from the analysis. Also experimental code is excluded, along with unused code, tool code or simulation code.
Depending on how well a system is organized, the code discrimination can be fairly easy. With the Maven standard directory layout, for example, application code usually reside under /src/main/, test code under /src/test/ and generated code is directly put into /target/. However, many systems are not that well organized. During numerous code quality audits, we discovered multiple factors increasing the chance of a chaotic system organization: For example, the length of the system’s history, the size of the team, developer fluctuation, technology changes, or simply a lack of organisational discipline. Thus, for software systems that have grown over the years, discriminating the code properly can take days.
Such an unorganized system structure creates several problems during the software life cycle. First, for any new developer entering your team, it will be harder to get an overview of the system. But long-time team members will also repeatedly face the question: »Is this code still needed?«—for example, when migrating the code to a new technology or when refactoring central functionality. Hence, regular code discrimination to filter out unused or experimental code, will help avoid confusion.
To sum it up, code discrimination is a vital first step when measuring code quality and also to keep the structure of your system well organized. Don’t start interpreting metric results if you haven’t gotten the code discrimination right yet. It this case, actually, »discrimination« is not to be condemned, but fundamentally necessary!
Our latest related blog posts.