Equality and Comparison in Java: Pitfalls and Best Practices

2020-01-15 · 9 min

Java has different methods of comparing objects and primitives, each with its own semantics. Using the “wrong” one can lead to unexpected results and might introduce subtle, hard-to-catch bugs.

Before we can learn about the pitfalls and best practices of equality and comparison in Java, we need to understand the different kinds of types and their behavior.

Table of Contents

Primitives vs. Objects

The Java type system is two-fold, consisting of eight primitive data types (boolean, byte, char, short, int, long, float, double), and object reference types.

Primitives

Primitives in Java can’t be uninitialized or null, they always have a default value. It represents 0, suitable for the specific data type:

 Primitive | Default Value 
-----------|--------------- 
 boolean   | false
 byte      | 0
 char      | '\u0000'
 short     | 0
 int       | 0
 long      | 0l
 float     | 0.0f
 double    | 0.0d

Primitive wrapper classes

Every primitive data type has a corresponding wrapper class in java.lang, encapsulating its value in a Java object:

 Primitive | Wrapper | Superclass
-----------|---------|------------
 boolean   | Boolean | Object
 byte      | Byte    | Number
 char      | Char    | Object
 short     | Short   | Number
 int       | Int     | Number
 long      | Long    | Number
 float     | Float   | Number
 double    | Double  | Number

Being objects allows them to be used in a wider range of scenarios:

Generic types (e.g., List<Integer>).
Pass-by-reference instead of by-value.
Ability to be null.
etc.

But we also have to deal with all the disadvantages. Like NullPointerException, a bigger memory footprint, and a performance impact.

Autoboxing and unboxing

The last thing we need to understand before we can learn about equality and comparison is boxing.

Even though primitives and object references have different semantics, they can be used interchangeably, thanks to the Java compiler.

Autoboxing is the automatic conversion of primitive types in their corresponding wrapper class, and unboxing is the other direction. This allows us to use both kinds of types without discrimination:

java

List<Integer> values = new ArrayList<>();

for (int i = 0; i < 50; i++>) {
    values.add(i);
}

Our List uses the wrapper type Integer, but our code compiles even though we add an int. That’s possible thanks to the compiler changing our code by autoboxing the i:

java

List<Integer> values = new ArrayList<>();

for (int i = 0; i < 50; i++>) {
    values.add(Integer.valueOf(i));
}

The same is true the other way around:

java

int sumEven(List<Integer> values) {
    int result = 0;
    for (Integer summand: values) {
        if (summand % 2 == 0) {
            result += summand;
        }
    }
    return result;
}

Even though we use operators like % and + that aren’t available to the object type Integer, the code compiles fine. Because the compiler unboxes the wrapper type. The actual compiled code looks more like this:

java

int sumEven(List<Integer> values) {
    int result = 0;
    for (Integer summand: values) {
        if (i.intValue() % 2 == 0) {
            result += i.intValue();
        }
    }
    return result;
}

Equality

If we look at other programming languages, the most logical conclusion for how to compare values might be the == operator and its antagonist !=.

Yes, we can use them to check for equality, and they compare values against each other, but it might not be the value you’re expecting.

Primitives

Primitives are literals, fixed values in memory, that can be tested for equality with ==.

Except when they can’t.

In contrast to the other primitive data types, the floating-point data types float and double can’t reliably be checked for equality with ==, due to their storage method in memory. They aren’t exact values:

java

float value = 1.0f;
value += 0.1f;      // 1.1f
value += 0.1f;      // 1.2f
value += 0.1f;      // 1.3000001f

boolean isEqual = (value == 1.3f) // false

We’ve got two options to deal with this. Either by using java.util.BigDecimal, which is exact. Or by using threshold-based comparisons:

java

float value = 1.0f;
value += 0.1f;      // 1.1f
value += 0.1f;      // 1.2f
value += 0.1f;      // 1.3000001f

float THRESHOLD = 0.00001f;
boolean isEqual = Math.abs(value - 1.3f) < THRESHOLD); // true

Arrays

Another pitfall is primitive arrays because arrays aren’t a primitive type, they’re objects.

Objects

If you compare objects with ==, it will also compare the value of the object. The only problem here is that the value of an object is actually its reference, hence the name object reference type.

This means two values are only equal if they point to the same object in memory.

In practice, variables might be equal in some cases, but not in others:

java

String a = "a";
String b = "b";
String ab = "ab";

boolean result1 = (a == "a");      // true
boolean result2 = (ab == "ab");    // true
boolean result3 = (a + b == "ab"); // false

The compiler and the JVM might optimize string constants, so result2 is true. And result3 is false because a + b creates a new object in memory. All of this can be implementation-dependent and differ between different JVMs.

Another “not so obvious” pitfall can happen with primitive wrapper types:

java

Integer a = 127;
Integer b = 127;
Integer c = 128;
Integer d = 128;
boolean equal      = (a == b); // true
boolean notEqual   = (c == d); // false
boolean equalAgain = (new Integer(128) == 128); // true

What? This one took me by surprise, too.

The valueOf(...) methods of java.util.Integer and java.util.Long actually cache values for specific ranges (-128 to 127), making a and b the same object, but not c and d. And thanks to unboxing, equalAgain is true.

Object.equals(Object other) and Object hashCode()

The java.lang.Object class provides an equals method for all its subclasses, with a quite simple implementation:

java

public boolean equals(Object obj) {
    return (this == obj);
}

By default, every one of our types inherits the “problematic” comparison of object references. To be able to use equals for actual equality, we need to override it in our types, having certain properties:

Reflexive: An object should be equal with itself: obj.equals(obj) == true.
Symmetric: If a.equals(b) == true, then b.equals(a) must also be true.
Transitive: If a.equals(b) == true and b.equals(c) == true, then a.equals(c) should be true.
Consistent: a.equals(b) should always have the same value for unmodified objects.
Null handling: a.equals(null) should be false.
Hash code: Equal objects must have the same hash code.

If we provide our own equals method, we also need to override hashCode.

Since Java 7, the class java.util.Objects provides helpers for simplifying our code:

java

class MyClass {

    private final String title;
    private final Integer value;

    public MyClass(String title, Integer value) {
        this.title = title;
        this.value = value;
    }

    @Override
    public boolean equals(Object obj) {

        // Reflexive
        if (this == obj) {
            return true;
        }

        // Null-handling
        if (obj == null) {
            return false;
        }

        // Different types can't be equal
        if (getClass() != obj.getClass()) {
            return false;
        }
        MyClass other = (MyClass) obj;

        // Let the helper do the rest
        return Objects.equals(this.title, other.title) &&
               Objects.equals(this.value, other.value);
    }

    @Override
    public int hashCode() {
        return Objects.hash(this.title,
                            this.value);
    }
}

Be aware of the class comparison in line 19. We might be inclined to use instanceof to compare objects, but this might violate the general contract between equals and hashCode: Equal objects must have the same hash code.

Of course, we can design our objects so that even subclasses are equal to their parents. But the definition of equality must be the same for both, the hash code calculation must occur in the base class.

The classes java.util.Date and its subclass java.sql.Date are defined that way. The sql version doesn’t have an equal or hashCode method, and the base class builds its hash code solely from the timestamp.

Another example is collection classes: java.util.ArrayList and java.util.LinkedList are both subclasses of java.util.AbstractList and use its equal and hashCode methods. Equality for collections is most of the times defined by the equality of their content, so using instanceof and not a hard class check seems appropriate.

Comparison

Just testing for equality is seldom enough. The other significant kinds of operations are comparisons of values.

Primitives

Like in other languages, we can compare the values of primitives with the <, >, <=, and >= operators.

The same problems of floating-point data types apply to them, so be aware. Also, boolean isn’t comparable except for equality with == and !=.

java.lang.Comparable

Objects don’t support these operators. To compare object types we need to implement the interface java.lang.Comparable<T> with its single method int compareTo(T).

The result of left.compareTo(right) is supposed to be the following:

 Result | Meaning / Order
--------|---------------------------
   0    | left is the same as right
  -1    | left < right
   1    | left > right

The result represents the natural order of our type, not just arithmetical comparability. This way, we can make collections of our type sortable.

Best Practices

There are some simple rules we should follow not to get the wrong results when comparing values for equality or their natural order.

Never compare objects with ==

It only works if it’s the same object. Different objects with the same value are not equal. Always use boolean equals(Object other) to compare for equality.

Always implement `equals` and `hashCode if needed`

To make a type testable for equality, we need to implement both equals and hashCode to ensure correct and consistent behavior.

Floating-point data types aren’t exact

Always remember that floating-point data types aren’t exact and are harder to compare.

If we need to work with decimal values and need absolute precision, we should always use java.util.BigDecimal. But be aware that its equals is based on its precision:

java

BigDecimal a = new BigDecimal("2.0");
BigDecimal b = new BigDecimal("2.0");
BigDecimal c = new BigDecimal("2.00");

boolean equal    = a.equals(b); // true
boolean notEqual = a.equals(c); // false

If we need a more relaxed comparison, we can use compareTo:

java

BigDecimal a = new BigDecimal("2.0");
BigDecimal b = new BigDecimal("2.0");
BigDecimal c = new BigDecimal("2.00");

boolean equal = a.equals(b);    // true
int result    = a.compareTo(c); // 0

The BigDecimal API isn’t pretty compared to primitive operations, primarily due to its immutability. But that’s actually a good thing, correctness comes before beauty.

Be aware of autoboxing/unboxing

Because the compiler does this behind the scenes, we must be sure to compare primitives, or wrapper objects, thanks to Integer/Long caching.

To be 100% sure, we could use Comparable<T>#compareTo(T) of the wrapper types instead, which always uses the encapsulated value, and not the object reference.