Code Biomarkers–A Virtual Code Reviewer

In medicine, a biomarker is a measure that might indicate a particular disease or physiological state of an organism. CodeScene’s biomarkers does the same for code. Combined with biomarker trends, this gives you a high level summary on the state of your hotspots and the direction your code is moving in.

Biomarkers example

Fig. 17 The Code Biomarkers shows the status of your hotspots at a glance.

CodeScene’s biomarkers are like an extra, virtual team member that constantly reviews your code. Let’s look into the biomarkers.

The Ideas Behind Code Biomarkers

We at Empear make heavy use of CodeScene ourselves. We use the tool as part of our services. Over the past years we have analyzed hundreds of different codebases, and there are some patterns that we have seen repeated over and over again. Thus, we started to implement support in CodeScene to auto-detect those patterns, and we called the feature biomarkers.

The biomarkers name requires a brief explanation. In general, we wanted to avoid terms like “quality” or “maintainability” since they are easy to game and, more serious, suggest an absolute truth. Instead we find that it’s the trend that’s most important: is the code evolving in the desired direction? In addition, an algorithm, no matter how smart, can only take us so far; at some level we want a human in the loop, and the code biomarkers are there to support that human by priming them on what to look for in the specific hotspot. Let’s look at some examples.

Explore your Code’s Biomarkers

If CodeScene has biomarker support for your language (see X-Ray for a list of supported languages), you will get a high-level trend on your dashboard as shown in Fig. 18.

Biomarkers on the dashboard

Fig. 18 Code Biomarkers summary on the analysis dashboard.

As you see on the dashboard, code biomarkers are scored from A to E where A is the best and E indicates code with severe potential problems. In this example, we see that this particular codebase has improved over the past month as indicated by the move from a D score to a C.

Biomarkers Present Actionable Metrics

Before we move on, how do we know that the biomarkers and scores are relevant? Well, the biomarkers are built on top of CodeScene’s other metrics and behavioral data. That means we only score the prioritized parts of the codebase, the one’s that are most likely to impact development and maintenance costs as show in Fig. 19.

Biomarkers build on hotspots

Fig. 19 Biomarkers are built on top of CodeScene’s prioritized hotspots.

Using this principle, Code Biomarkers fill a number of important gaps:

  • Bridge the gap between developers and non-technical stakeholders: The biomarkers visualization provides information to managers that help decide on when to take a step back, invest in technical improvements, and measure the effects.

  • Get immediate feedback on improvements: The biomarker trends gives you immediate and visual feedback on the investments you do in refactorings.

  • Share an objective picture of your code quality: The biomarker scores are based on baseline data from throusands of codebases, and your code is scored against an industry average of similar codebases.

  • Get suggestions on where to start refactorings: The code biomarkers hint at specific problems in each file, which also suggests which refactorings that could be used to address the findings.

Let’s demonstrate those properties by having a more detailed look at biomarkers in Fig. 20.

Detailed Biomarkers for a specific project

Fig. 20 Detailed Biomarkers for a specific project.

The biomarkers in Fig. 20 provide detailed indications for each prioritized hotspot. We note that the file QueryTestBase.cs has been sucessfully refactored since last month. We also note the warning sign for GraphUpdatesTestBase.cs (see the yellow marker to the lef in the figure), which has degraded from a D to an E.

We get more details when we hover over one of the high-level descriptions as shown in Fig. 21.

Detailed Biomarkers for a specific hotspot

Fig. 21 Detailed Biomarkers for a specific hotspot.

Use these detailed biomarkers to initiate refactorings. We also recommend to run an X-Ray analysis on the hotspot to get more insights now that we know what to look for. We show an example of a QueryTestBase.cs X-Ray in Fig. 22.

X-Ray based on Biomarkers

Fig. 22 Use X-Ray to follow-up on the biomarkers.

We’ll return to our discussion on how to act upon the biomarker indications towards the end of this guide. Before we get there, it’s important to note that CodeScene includes social biomarkers too. You see an example on this in Fig. 23.

Social Biomarkers for a specific hotspot

Fig. 23 Social Biomarker indication found in a specific hotspot.

In this case, CodeScene noted that seven separate developers have worked on the code over the past weeks, and this fragmentation (see Parallel Development and Code Fragmentation) puts the code at risk for defects and unexpected feature interactions. A high developer congestion might also make the code harder to understand since any mental models we have of the code are likely to become outdated fast due to the massive parallel work on the code.

The Future of Code Biomarkers

This is an early release of the biomarkers concept. We have been using them internally for our services and found that the biomarkers saves us a lot of time and manual inspections. That’s why we decided to include them in the product too and share them with you.

We plan to extend the biomarker support to more programming languages. We also have prototypes for serveral other types of markers that we can detect in the evolution of code, so the concept is likely to expand over time. In addition, we also plan to provide more detailed trends and information on each detected biomarker.

As always, if you lack support for a particular language, please let us know and we’ll try to support it.