Signal-to-Noise Ratio

2019-12-06 · 9 min

The signal-to-noise ratio is a measure used in science and engineering that compares the level of the desired signal to the level of background noise.
— Wikipedia

Knowledge should be transferred with as little interference as possible. As little noise as possible should be transmitted compared to the desired and actual relevant signal.

There are different kinds of noises in our code: forced or external noises vs. self-inflicted noises.

Table of Contents

Forced or external noise

License mumbo-jumbo

The first thing you encounter in most files is forced noise, a copyright disclaimer with licensing.

The MIT license is about 19 lines long. Oracle Java uses about 24 lines of licensing mumbo-jumbo on every file. Why do we do this to ourselves?

Most likely, because the legal department forces us to do it. So we pollute every single file with the same crap over and over again, pushing the actual code down, making it even necessary to scroll most of the time we want to read the actual signal.

I understand the legal requirements of licenses, copyrights, etc. But why can we not use a more straightforward solution?

Some companies and projects are starting to use shorter notices, like Chromium or Golang. They replaced the long noise with a short 3–4 line summary, containing which license was used and where to find the actual license file:

// Copyright 2009 The Go Authors. All rights reserved.  
// Use of this source code is governed by a BSD-style  
// license that can be found in the LICENSE file.

In these 3 lines, they managed to offload the noise to another file AND make the legal department happy. Sometimes legal might not accept the license to be in a different file, but maybe we could offload the noise to the bottom of the file below the signal and use a small summary with “please see below”.

We would still provide the license with every source file but won’t waste too much screen estate at first glance. I’m not a lawyer, I’m a software developer, so please consult your legal department if such a solution is acceptable before changing anything.

Imports and Requirements

Another forced noise are imports and requirements.

Every language has them in some way, they have many names like import, require, include, etc. Most of the time, our IDEs manage them for us and fold them neatly away to a single line so they won’t bother us. In combination with the default setting to import every single file/class/module instead of using wildcards (if your language even supports wildcards), this will add a lot of additional lines.

Here’s an example from one of my Java codebases:

java

import java.time.LocalDate;  
import java.time.Year;

import java.util.List;  
import java.util.Map;  
import java.util.Objects;  
import java.util.Optional;  
import java.util.OptionalInt;  
import java.util.function.Function;  
import java.util.function.Supplier;  
import java.util.stream.Collectors;  
import java.util.stream.IntStream;

import org.apache.commons.lang3.StringUtils;  
import org.apache.commons.lang3.NumberUtils;

import org.slf4j.Logger;  
import org.slf4j.LoggerFactory;

15 import statements with additional blank lines. And these are just the statements for dependencies, I’ve removed another 24 statements for project-internal imports. Lets see if we can reduce the noise be using wildcards:

java

import java.time.*;  
import java.util.*;  
import org.apache.commons.lang3.*;  
import org.slf4j.*;

We seldom really look at our imports, the IDE knows this and folds them away, so why bother with so much verbosity? Even when it’s hidden, it’s still there, bearing not much signal but much noise. Maybe your IDE will hide it, but what about other people reading the code?

A valid counterpoint to using wildcards is what you might call the glance effect: If you just glance at the imports, you might get an initial idea of what the code is doing.

The example above won’t give you any clues about its inner workings, but if you import a class called DateFormatter or a library IOUtils, you might get some information about the code without looking at the rest.

Due to IDEs, this is an ignorable noise compared to other noise sources, but we still should understand the implications of auto-hidden noise.

Self-induced noises

Noisy Logging

Logging is one of the most potent noise generators because we can pollute our code and our logfiles at the same time!

Some people like to log everything. EVERYTHING. EVERYTHING!

The reasoning is simple: it’s better to have a log than not having one. But if our code looks like its primary purpose is logging, we are doing it wrong!

Our code has requirements, so has our logging. But what is our logging requirement?

What do we want to log? What do we need to log?

Everything is not an acceptable answer.

If we really want to log everything, we also need to log the logging. But logging everything is requirements bankruptcy[¹], we either don’t know or don’t care about our actual logging requirements.

Even if we would ignore any performance impact of too much logging, we create harder-to-understand codebases with endless lines of unnecessary and unneeded logging statements. And if we eventually need some information from the log files, we won’t find it between all the overhead logging nonsense.

By reducing the logging calls, you will end up with a more concise and readable code. And your log files will be cleaner, too.

Log, throw, forget

Another noise-increasing bad logging habit is the log and throw.

Either handle or throw your exception
— Robert C. Martin, Clean Code

A typical example:

java

public void myCrashableMethod() {  
    try {  
        thisMightCrash();  
    }  
    catch (Exception e) {  
        log.error("It crashed.", e);  
        throw e;  
    }  
}

By just logging an exception but not handling it, we created an entry in our log files, but the actual problem still exists. The caller of our method might do the same, so now we got two log entries about the same exception and maybe still no proper handling of the original exception. And what about the caller’s caller?

We should either handle an exception right away in a conclusive manner or just throw it without catching. Or maybe repackage it for the caller and let them deal with it, including logging it if necessary.

Comments

We need to concentrate on intent, not action. The code itself should tell us as much as possible, and the comments should only supplement the missing parts.

Know what to document, and what doesn’t need an explanation.

java

// BAD:    Sets the value of size to 32  
// BETTER: Underlying data pipeline only supports size 32 
int size = 32;

// BEST: Improved name, no comment needed
int maxPipelineSize = 32;

Try to know your audience. Documentation will differ widely from internal comments about intent. By being precise and concise, you can reduce the noise and improve the signal.

I’ve written before about comments, and how they can be a plentiful source of noise, misinformation, and problems.

Unnecessary code

It’s easy to end up with more code than is actually needed.

Most languages support some kind of redundancy in declarations, like using visibility modifiers even though you’re using default visibility. Or auto-generated code that’s never used.

But every additional line of code has to be read and identified as excessive, so it’s best not even to have it in the first place.

Sometimes we have written code that too nice to throw away, even when we don’t actually need it. So we keep it in the project, just in case.

Noise isn’t always as simple as overhead code that can easily be reduced and avoided by better habits in the future. There is also context switching, which will induce psychological noise.

It will strain your mind by introducing unnecessary cognitive overhead while navigating unfamiliar codebases. And our codebase can quickly become unfamiliar to ourselves after a very short time.

As soon as our code becomes unfamiliar, we have to invest some of our finite mental capacity to understand it again, which might be better spend on other tasks. Just putting the code into comments might also be really confusing. Because now even the compiler can’t check if the code is actually in working order anymore.

Instead of just letting code lay dormant, in comments or not, we can utilize the power of our version control system. Either delete the code, it will always be retrievable, or create a new branch with it, for later consumption. This even has the advantage that you might keep the branch up-to-date and check it against other newly created code, so you might be able to merge it back to master when it’s finally needed.

Switching Contexts

Software development is based on abstractions. No-one wants to type directly binary code with 0 and 1. Even the inventor of the computer program, Alan Turing, used Base32 to code, and not 0 and 1.

The lowest-level abstractions like assembler or shader languages are only used when we really need to, like due to performance requirements or hardware restrictions. Layer after layer of abstractions is crafted on each other until it becomes a convenient environment to interact with our data.

Every time we break through the layers, we need to adjust our mindset accordingly because we might end up in a different world with different rules.

After 50 lines of high-level business logic, why are we now shifting bits or using pointer arithmetics, followed by more high-level code?

Reading code doesn’t necessarily mean that we understand every single line, but going through a block of unfamiliar high-level abstraction and find low-level code will be a bump in the road that will force our minds to switch contexts.

We should always try to stay on our layer of abstraction. Mixing layers disturbs the flow of reading and understanding. If we got a reason to switch contexts and not abstract it in another way, try to give an explanation of why. Sometimes it’s easier to do things with low-tech code, but maybe the functionality should be provided in an understandable way.

Conclusion

We should always strive for cleaner and more concise code that won’t be a burden later on. The signal rate must be higher than the noise, or information might be lost in the static.

Not every noise is avoidable, some if even forced upon us. But that doesn’t mean we should try to keep it as minimal as possible.

And even if we have to accept the noise it might help to understand the reasoning behind it to make it better fit in the mental model of our code.

#best-practice

Support Me on Ko-fi