Code Health – How easy is your code to maintain and evolve?

Code quality issues cost time, money, and missed deadlines. It’s vital for decision making to know when you can safely move ahead and implement new features as well as when you might have to take a step back and improve what’s already there. That way, your system remains maintainable which is the foundation for developer productivity and great products.

CodeScene’s code health measure points gives you a straightforward overview that lets your team drill down to the code with actionable recommendations. It’s a game changer – understanding the code quality of a large-scale codebase has never been this easy!

CodeScene measures the code health across a codebase and presents KPIs.

Fig. 26 CodeScene measures the code health across a codebase and presents KPIs.

Code Health identifies factors known to impact Maintenance costs and Delivery Risks

The Code Health metrics is based on patterns known to correlate with increased maintenance costs. Patterns that make the code harder to understand and, hence, increase the risk of change and make the module more expensive to evolve. Recent research has also found that low code health leads to a high number of total security errors. So code health captures both a productivity as well as a correctness dimension.

The Code Health score goes from 10 (healthy code that relatively easy to understand and evolve) down to 1, which indicates code with severe quality issues. The score is calculated from a combination of both properties of the code as well as organizational factors. In total, CodeScene calculates 25-30 factors depending on programming language. Examples include – but are not limited to – the following:

  • Brain Method: A single function/method that centers too much behavior and becomes a local hotspot.

  • Nested Complexity: This is typically revealed as if-statements inside other if-statments and/or loops, and is a construct that significantly increases the risk for defects.

  • Bumpy Road: A bumpy road is a function that fails to encapsulate its responsibilities, leading to code containing multiple logical chunks of logic. Just like a bumpy road will slow down you driving, a bumpy road in code presents an obstacle to comprehension. There’s also an increased risk for feature entanglement. The remedy is often to extract and encapsulate the chunks of logically dispersed behaviors in their own functions.

  • Developer Congestion: Code becomes a coordination bottleneck when multiple developers need to work on it in parallel (see Parallel Development and Code Fragmentation).

  • Knowledge Loss due to former contributors: If the developer behind a hotspot with low code healt leaves the organization, the maintenance risk increases significantly.

  • DRY (Don’t Repeat Yourself) Violations: CodeScene detects duplicated logic that is actually changed together in predictable patterns.

  • Primitive Obsession: Code that uses a high degree of built-in, primitives such as integers, strings, floats, often lacks a domain language that encapsulates the validation and semantics of function arguments.

The Code Health trends can be automatically supervised in your CI/CD pipeline and/or Pull Requests, so check out how to enable that integration: Integrate CodeScene with Pull Requests.

The Code Health profile and main KPIs

A single KPI isn’t enough to capture the multi-facetted aspect of code health in a larger codebase:

  • The hotspots could be healthy, but other parts of the code have severe issues.

  • Averages are tricky since they might hide low-scoring files that represent a risk.

  • Or maybe you have a problematic legacy module that is relatively stable but could be a long-term risk?

That’s why CodeScene presents three separate metrics to create a unique code health profile of your codebase:

The three KPIs give you a representative view of the code health.

Fig. 27 The three KPIs give you a representative view of the code health.

The KPIs represent:

  • Hotspot Code Health: A weighted average of the code health in your hotspots. Generally, this is the most critical metric since low code health in a hotspot will be expensive.

  • Average Code Health: A weighted average of all the files in the codebase. This KPI indicates how deep any potential code health issues go. Requires that you enable the Full Scan Code Health option.

  • Worst Performer: A single file code health score representing the lower code health in any module across the codebase.

    Requires that you enable the Full Scan Code Health option.

Advanced: How is the weighted average calculated?

The code health scores are aggregated by a weighted average. The weight is the number of lines of code (LoC) in each file. That way, a file with 5000 LoC carry more weight than a small file with 100 LoC. This provides a more accurate aggregated score. Consider:

Let’s say we have three files:

  1. a.c: code health 2.0, LoC 5000 (a massive problem)

  2. b.c: code health 10.0, LoC 100 (small and simple)

  3. c.c: code health 10.0, LoC 10 (even smaller and simpler)

Calculating a pure average gives a code health of 7.33 which is high and non-representative.

However,a weighted average gives a code health of 2.18 which is much, much more representative of the codebase as a whole.

Adapt Code Health to your Coding Standards

CodeScene’s code health rules are calibrated against real-world codebases. As such, the default rules represent the state of the art when it comes to predicting maintenance and delivery risks. That said, we understand that users want a certain level of control over the code health rules.

The code health rules are customized by adding a .codescene/code-health-rules.json file to your Git repositories. That way, the code health rules are persisted and version-controlled together with the application code they apply to.

CodeScene provides a template JSON file that includes documentation. You access that file via the Hotspots sections of your project’s configuration. Let’s start by looking at the configuration options:

  • Example on overridden code health rules .codescene/code-health-rules.json

{
  "usage" : "Persist this file inside your repositories as  ...",
  "rule_sets" : [ {
    "matching_content_path" : "test/**",
    "matching_content_path_doc" : "Specify a glob pattern relative to ...",
    "rules" : [ {
      "name" : "Brain Method",
      "weight" : 0.5
    }, {
      "name" : "Large Method",
      "weight" : 0.0
    } ]
  } ]
}

Starting from the JSON template that you get via CodeScene’s configuration view:

  1. Remove any rules that you want to keep as-is. This prevents clutter in the config file.

  2. Specify a weight of 0.0 to disable a rule. See “Large Method” above for an example.

  3. Specify a lower weight for the rules you want to keep but down-prioritize. See “Brain Method” above for an example. A value of 0.5 still implies a code health hit but only at 50% of the default impact.

  4. Commit the .codescene/code-health-rules.json file inside your repository.

Limit the customized rules to part of your code

The .codescene/code-health-rules.json file lets you limit the customization to a part of your codebase. This is done via glob patterns as specified by the matching_content_path field. Common examples include:

  • Differentiate between test and application code: Maybe you want to allow your test suites to grow slightly larger, or perhaps you want to allow a certain degree of code duplication between test methods. Specify a rule set for the pattern test/**, which means all code in a top-level test folder.

  • Use different rules for different programming languages: As an example, **/*.js means just JavaScript code. Other languages aren’t impacted by these overriden rules.

Note that you can have multiple rule sets – each one matching one piece of content – inside the same configuration file.

What happens to disabled rules?

  • Disabled rules are no longer be part of the code health calculation. This means the reported code health can look better than the initially reported baseline.

  • Disabled rules will not be presented in the virtual code review.

  • Disabled rules will not be supervised as part of the delta analysis and PR quality gates.

To ensure transparency, CodeScene presents a searchable summary of all overridden rules. You find that summary under the Scope section of each analysis:

CodeScene presents a searchable summary of all overridden code health rules

Fig. 28 CodeScene presents a searchable summary of all overridden code health rules

Advanced: use multiple rule sets and global code health rules

CodeScene lets you point out one specific repository – via the project configuration – that serves as the source for global rules. These global rules will apply across all Git repos in your project. Typically, the global rules are used to reflect organization wide coding rules.

The global rules are overridden by .codescene/code-health-rules.json inside each repository. Hence, the hierarchy is:

  1. Local rules inside a repository have the highest precedence.

  2. The global rules specified in your project configuration has the second precedence.

  3. If none of those rules match, we use CodeScene’s defaults, meaning maximum relative weight for each rule.

Recommendations for customizing code health rules

Our recommendations for customizing the code health rules:

  • Never disable the hard rules: The individual code health metrics come in two categories: 1) rules and 2) heuristics. While the heuristics might be at odds with your internal coding standards, the rules are hard to argue against. Disabling rules might mean that you miss an opportunity to act early on potential problems.

Auto-Detect Declining Code Health with the PR Integration

A code health decline can be expensive to reverse. To prevent it, integrate CodeScene in your Pull Requests: Integrate CodeScene with Pull Requests.