Understanding Java Generics: Key Features and Common Pitfalls

2024-06-26 · 21 min

Generics are an indispensable feature in the world of Java programming, shaping how we write and interact with code. Despite their ubiquity and importance, many developers only scratch the surface of what Generics can do.

While they provide powerful tools for creating flexible and reusable code, the intricacies and subtleties often remain a mystery to many, myself included.

That’s why I wrote this article to demystify Generics and offer an overview of their features and common pitfalls so we can confidently harness their full potential.

Why Generics?

Java Generics were introduced in Java 5 in 2004, marking a significant evolution of the language that enhanced type safety and enabled more robust and reusable code.

Before Generics, non-specialized collection and container types simply stored instances of Object, requiring explicit casting and type checking, making it cumbersome but also error-prone. When everything is Object, the compiler won’t help out, and one wrong casting decision and boom! ClassCastException.

The following non-Generic, raw code compiles fine (except for the compiler nagging about the unchecked/unsafe operations):

java

List numbers = new ArrayList();
numbers.add(5);
numbers.add(23);
numbers.add("42");

for (Object obj : numbers) {
    Integer value = (Integer) obj;
    boolean isEven = value % 2 == 0;
    System.out.println(number + " is even: " + isEven);
}

As you might have already suspected, the code explodes when run, as the String representation of 42 is not an Integer, generating the following output:

Note: Test.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
5 is even: false
23 is even: false
Exception in thread "main" java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Integer (java.lang.String and java.lang.Integer are in module java.base of loader 'bootstrap')
    at Test.main(Test.java:11)

This little example already perfectly illustrates the motivation behind Generics: Type safety.

The primary purpose of Generics is to give classes, interfaces, and methods the possibility to parameterize themselves with the permitted types to be used with them. This explicit definition of types provides compile-time safety, as the compiler now has insight into what you actually want to do.

By enforcing type constraints at compile-time, Generics eliminate the need for explicit type casting and checking, effectively moving many type-related runtime crashes to compile-time. This not only reduces runtime errors and makes your code more reliable, as problems are caught earlier during compile-time, but your code will also be more expressive, as types communicate more clearly what types they represent.

Generics also aimed at improving code reusability and maintainability. By defining generic classes and methods, developers can write more flexible and reusable code that works with any type. For example, we could’ve created an IntegerList or a StringList to have a type-safe contract that’s validated at compile-time. But then, what about the many other types we might want to store in a List? We’d either duplicate a lot of code or create elaborate type hierarchies to share code between them.

Instead, the Generic List<T> is a singular interface that can handle any singular type for its content, and the compiler will make sure we use it as intended without needing to cast anything.

There are a few limitations on how the compiler checks the related types and the implementation is not 100% generic. More on that later in the article!

Thanks to generics, only a single interface is needed without duplicating the implementation for each type. This not only reduces code duplication but also makes the code easier to understand and maintain.

Overall, the motivation behind Java Generics was to provide a more powerful, expressive, and type-safe language construct that addresses the limitations of pre-generics Java. By allowing types to be parameters, Generics bring the benefits of polymorphism and abstraction to a new level, enabling developers to write cleaner, more efficient, and less error-prone code.

Type Erasure

One thing you might already be familiar with, or caught in the previous paragraphs, is the focus on “compile-time”. That’s because Generic types are erased during compilation, effectively converting any generic type into their raw types. The previous List<Integer> becomes effectively a List at runtime, and all its methods work on Object instead of Integer.

You might ask, “why introduce a feature to increase type-safety and then remove that safety runtime?” It’s not that this is an impossible feature, as there are many languages that retain Generic type information at runtime:

C# uses the concept of reification for its Generics, whereas Java only uses it for non-Generic types (JLS §4.7).
C++ templates generate separate copies for each type used.
Kotlin has the keyword reified at least for inline functions.

So, why did Java decide to use type erasure?

Well, it’s because of one of Java’s best (and sometimes worst) aspects: backward compatibility.

By replacing all Generic types with either Object or their specified bound, the generated Bytecode is compatible with any code and runtimes that existed before.

Nowadays, we don’t think much about it, as everything uses Generics, and most likely, all the code we’re using has evolved since Java 5. But in 2004, the Java world looked differently. There already was a considerable ecosystem before the introduction of Generics, so the Java team of the past made sure that a seamless transition to Generics was possible.

The Bytecode generated from Generic and non-Generic code is uniform, meaning it follows the same structure and type-information. This uniformity guarantees that the JVM can execute both without any special handling or changes to the JVM itself.

Of course, there are downsides to choosing type erasure, too. Primarily the lack of types at runtime. However, as with any other feature in a programming language, it was a design compromise. The language team at Sun chose to make interoperability possible and allow for gradual adoption without breaking the ecosystem.

Types of Generics in Java

Generics are available for classes, methods, and interfaces. Each of them serves a specific purpose and has a particular usage pattern.

Records, introduced in Java 14, are also a valid target for Generic design.

Generic Classes

A Generic class can operate on any of the specified types. This is particularly useful for creating collections and data structures that can handle or hold various object types without sacrificing type safety.

The syntax to specify the Generic type is putting it between <> (angle brackets) directly behind the type name.

For a single Generic type, usually T is used. Collections use E for “element”, or K and V for maps. Return types or often called R. However, any identifier will suffice, but not to clash with any existing type name, single uppercase letters are preferred.

The parameterized type is available throughout the class and can be used for fields, method return types, or method arguments:

java

public class Box<T> {

   private T value;

    public T getValue() {
        return value;
   }

    public void setValue(T value) {
        this.value = value;
    }
}

This Box<T> type can hold any singular type. Without any further type bound, T will be treated as Object if you want to call any method on the field or argument value.

Only when used in an instance and T gets specified the Box will magically become type-safe:

java

Box<CustomDataStructure> box = new Box<>();

box.setValue(dataStructure);

CustomDataStructure value = box.getValue();

The particular instance of Box<T> can only hold CustomDataStructure. No need for casting anything; code-completion will infer the correct type, and the compilation will fail if you use the wrong kind.

Records behave like classes, where the Generic type can be either used for a component in the canonical constructor, as method arguments, or as a return type:

java

public record Box<T>(T value) { }

public record AnotherBox<T>(String value) {

    public T genericMethod(T incoming) {
        // ...
    }
}

Even though Enums are a special kind of class like Records under the hood, they can’t use Generics for their type declaration, only as concrete implementation of a Generic interface, or using a Generic method.

Generic Methods

Like classes having Generic type parameters, methods can have their own, which are independent of any class-level ones, if any. This creates more flexible and reusable methods that can operate on multiple types.

Similar to classes, the Generic type information is placed in <>. This time, however, it’s placed before the return type:

java

public final class Printer {

    public <E> void print(Collection<E> collection) {
        for (E element : collection) {
            System.out.println(element);
        }
    }
}

The print method simply iterates over a collection and prints the elements to System.out. As T is effectively Object, the toString method is called. There’s no need for actual types, allowing the method to print any type of Collection<E> without relying on raw types.

Generic Interfaces

Maybe the most common use for Generics are interfaces, as they give us “generic” blueprints for types that are flexible and versatile.

Let’s talk about the java.util.function.Function interface. It represents a function that accepts a single value and returns another one. Any type of value is accepted, and any type might get returned.

Without Generics, there are only two ways to do that:

Use Object for everything
Add a variant for each combination of types

The first option isn’t type-safe and not much fun to use.

The latter is feasible if we want to support a few specific, non-general types, but will most likely lead to code duplication, especially if we want to support more types. And Function doesn’t actually care about the actual types or does anything with them except provide the scaffold for creating lambdas and ensuring type safety.

So, let’s use Generics!

java

public interface Function<T, R> {

    R apply(T t);
}

This simplified representation of Function<T, R> has everything we need to know about it: a function is applied to value T, which creates a value R.

Now we have a generic template for any Function we ever use.

Using the Generic type isn’t restricted to abstract method declarations. We can also use them for default and static methods. This is how Function<T, R> supports functional composition and identity function:

java

public interface Function<T, R> {

    R apply(T t);

    default <V> Function<V, R> compose(Function<? super V, ? extends T> before) {
        Objects.requireNonNull(before);
        return (V v) -> apply(before.apply(v));
    }

    default <V> Function<T, V> andThen(Function<? super R, ? extends V> after) {
        Objects.requireNonNull(after);
        return (T t) -> after.apply(apply(t));
    }

    static <T> Function<T, T> identity() {
        return t -> t;
    }
}

Bounded Type Parameters

You might have spotted the <? super V> or <? extends T> in the previous chapter, which was used without any further explanation.

These are bounded type parameters, which constrain Generic types. These create an upper or lower bound of acceptable types to replace the Generic parameter.

Upper Bounded Wildcards (extends)

The extends keyword creates an upper bound, meaning the Generic type parameter must be a subclass of that type or implement that particular interface.

For example, what if we want a Box<T> but only want to allow numbers? The boxed types of the JDK are all descendants of the Number type, so we can add it as an upper bound:

java

public class NumberBox<T extends Number> {
    // Unchanged implementation
}

At this point, NumberBox<T> is just a more limited version of Box<T>. The real power a bounded type gives you is making T no longer an Object but the actual one: Number:

java

public class NumberBox<T extends Number> {
    // ...

    public long longValue() {
        return this.value.longValue();
    }
}

A NumberBox<T> could hold an Integer or Double, but type constraint allows you to create a specific implementation based on the shared ancestor.

Lower Bounded Wildcards (super)

Where an upper bound creates a type ceiling with Object as the highest option, a lower bound works the other way around.

Let’s say we have a method that only works on a List containing an Integer or one of its parent types. That means it should only accept Integer, Number, or Object, but not any of the other Number types like with the previous upper bounded wildcard.

That’s where the keyword super comes in:

java

void addTheAnswer(List<? super Integer> list) {
    list.add(42);
}

The Generic type of List<E> is now constraint to any type (represented by ?) that’s either an Integer or above in its type hierarchy:

java

import java.util.ArrayList;
import java.util.List;

public class LowerBounds {

    static void addTheAnswer(List<? super Integer> list) {
        list.add(42);
    }

    public static void main(String[] args) {

        // Number is the parent of Integer
        List<Number> numberList = new ArrayList<>();
        addIntegers(numberList);

        // Object is the parent of Number
        List<Object> objectList = new ArrayList<>();
        addIntegers(objectList);

      // THIS DOES NOT COMPILE!
      // List<Double> doubleList = new ArrayList<>();
      // addIntegers(doubleList);
    }
}

The List<Double> code won’t compile, as Double has no direct relationship with Integer, just the shared higher-up Number. However, the lower bound specifies Integer, so the compiler enforces the constraint.

Unbound Wildcards (?)

The last type of wildcard is the unbounded variety.

We had the previous example of a Generic print method that didn’t need any type-information:

java

<E> void print(Collection<E> collection) {
    for (E element : collection) {
        System.out.println(element);
    }
}

The reasoning for making it Generic was not to rely on a raw type and being able to accept any type of collection. The actual E never mattered, and an Object would suffice.

This could be achieved with a higher bound on Collection:

java

void print(Collection<? extends Object> collection) {
    for (Object element : collection) {
        System.out.println(element);
    }
}

It works, but it’s a verbose way of saying that any type is allowed, and we don’t actually care about it.

That’s why there’s a shorter alternative: the unbounded wildcard ? (question mark):

java

void print(Collection<?> collection) {
    for (Object element : collection) {
        System.out.println(element);
    }
}

It’s a helpful feature for creating reusable and versatile code without concern for the specific Generic type. It keeps our code more straightforward and the compiler happy.

However, there is a downside to unbounded wildcards.

As they represent any type, it’s an unspecific type and won’t become more specific when used. No matter what kind of Collection<E> you pass to print, the argument collection will remain a Collection<?>, making certain operations impossible, such as adding to it:

java

void addTheAnswer(List<?> list) {

    // THIS DOES NOT COMPILE!
    list.add(42);
    // The method add(int, capture#1-of ?) in the type
    // List<capture#1-of ?> is not applicable for the arguments (int)
}

The unwieldy compiler error shows that <?> isn’t just shorthand for ? extends Object> but actually represents “any non-specific type”, enforcing a read-only nature.

Intersection Types

A rather not well-known feature that’s quite helpful with Generic bounds is intersection types. They are a way to express that a type parameter must simultaneously satisfy multiple type constraints.

So far, we looked at how to constrain a parameterized type with a single bounded type. But sometimes, it makes a lot of sense to enforce the conformity to multiply types, like common-use interfaces.

Take Serializable, for example. It’s a marker interface that a lot of classes in the JDK, and most likely also your code, adhere to.

Using it as an upper bound wouldn’t constrain the Generic type as narrow as we might need. Adding an intersection type with the help of & (ampersand) will help to create an actual viable constraint:

java

<T extends Runnable & Serializable> void execute(Collection<T> tasks) {
    // ...
}

This method only accepts a Collection of a type that is both a Runnable and Serializable. Either type alone is quite open, but in combination, the bound has a specific meaning for the task at hand.

Even Generic intersection types are possible, like adding Comparable<T> into the mix:

java

<T extends Runnable & Serializable & Comparable<T>> void execute(Collection<T> tasks) {
    // ...
}

This creates quite specific bounds and gives us access to methods of all the types without having to restrict the method to a concrete type implementing them.

Generics and Reflection

Reflection is another essential feature of Java. One of its capabilities is accessing type information at runtime. But how well does that work with type-erased Generics?

If a method returns a List<T>, how can we identify the concrete type of T at runtime? Well, even with the new methods available on java.lang.reflect.Method you won’t get what you expect:

java

public class GenericReflection {

    static record Box<T>(T value) {

        List<T> asList() {
            return List.of(value());
        }
    }

    public static void main(String... args) throws NoSuchMethodException, SecurityException {

        // CREATE A BOX FOR STRINGS AND GET ITS CLASS
        Box<String> box = new Box<>("a box");
        Class<?> concreteClazz = box.getClass();

        // LOOKUP THE METHOD
        Method method = concreteClazz.getMethod("asList");

        // TRY TO FIND THE GENERIC TYPE
        Type returnType = method.getGenericReturnType();
        if (returnType instanceof ParameterizedType paramType) {
            Type[] actualTypeArguments = paramType.getActualTypeArguments();

            for (Type type : actualTypeArguments) {
                System.out.println("Type: " + type.getTypeName());
            }
        }
    }
}

The code above will output Type: T

We already know that, and it’s not very helpful… But this is kind of expected.

Java doesn’t generate an explicit class for every use of a Generic one, so there’s just a singular Box.class without any type information (besides T).

If we need the actual type at runtime, there are workarounds/hacks, though.

Type Tokens

Type tokens are a hack to create a token that knows its Generic type. It’s an abstract class that stores the type information in its constructor, but we need to create an ad-hoc implementation to create a new class:

java

import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;

public abstract class TypeToken<T> {

      private final Type type;

    public TypeToken() {
        ParameterizedType pType = (ParameterizedType) getClass().getGenericSuperclass();
        this.type = pType.getActualTypeArguments()[0];
    }

    public Type type() {
        return this.type;
    }
}

Now, we can store a Generic type in a token:

java

TypeToken<List<String>> token = new TypeToken<>() {};

System.out.println("Actual Generic Type: " + token.getType());

This simple implementation should illustrate the general concept of type token. Even though we finally get the used Generic type as java.util.List<String>, the Type interface has little to no additional information for us to work with.

We need a much more complex implementation for getting actual useful type information. If you want to know more about how it’s done, I recommend checking out how Guava does it.

Storing the actual Class

If the compiler won’t store the actual type information for us, we can always do it ourselves:

java

public class StoringTheClass {

    public static class Box<T> {

        private final T        value;
        private final Class<T> genericClass;

        public Box(T value, Class<T> genericClass) {
            this.value = value;
            this.genericClass = genericClass;
        }

        public T value() {
            return this.value;
        }

        public Class<T> genericClass() {
            return this.genericClass;
        }
    }

    public static void main(String... args) {

       Box<String> box = new Box<>("This is a String", String.class);

       System.out.println("Box Generic class: " + box.genericClass());

       // DOES NOT COMPILE
       // Box<Integer> numberBox = new Box<>(42, Long.class);
   }
}

It’s clunky and verbose, but it works. The compiler even ensures that we use the correct class for the argument. Still, I wouldn’t recommend it as an everyday tool on your tool belt. But it’s nice knowing you have it at your disposal when you need it.

Common Pitfalls and Gotchas

There are a few more things to look out for with Generics besides the general lack of types during runtime.

No Primitive as Generic Types (yet)

Only object types are valid for use as a Generic type, so no int, bool, etc.

However, we can use the boxed variants Integer, Boolean instead, if the overhead of auto-boxing is an acceptable trade-off for the desired context. Be aware that you create a new issue: making null a valid value for the type, which doesn’t exist for primitives.

Currently, the OpenJDK Project Valhalla tries to remedy the lack of primitive Generics, among other things. So we will eventually get them in a future Java version.

Avoid Generic Arrays

There are several issues with Generic arrays in Java.

There’s no way to instantiate them directly:

java

public class ArrayBox<T> {

    private final T[] arr = new T[42]; // Compilation error

    public T[] arr() {
        return this.arr;
    }
}

The only option is to create an Object[] and do an unsafe cast to T[]:

java

private final T[] arr = (T[]) new Object[42];

Even with the then generic array, we have potential ClassCastException coming our way thanks to Java arrays being covariant. That means it can be assigned to a super-class type.

For example, an ArrayBox<Integer> inner array can be assigned Number[]; no issues here, as Integer is a subtype of Number.

If T is a subtype of U, then T[] is also a subtype of U[]

But when we try to set a value valid for the Number[] which isn’t compatible to the Generic way of doing things, leading to runtime exceptions if the wrong type gets inserted after reassignment:

java

Box<Integer> box = new Box<>();

Number[] nArr = box.arr();
nArr[0] = 23; // That's a valid Number and works fine without Generics
// => ClassCastException at runtime

The same approach of down-casting to a common ancestor isn’t possible with non-arrays. For example, replacing the field T[] arr with a List<T> would lead to the compiler preventing you from trying to assign a List<Integer> to a List<Number>, as the types aren’t covariant.

The easiest way around the array issues is to use one of the Collection types like List<T>, Set<T>, or Map<K, V> instead. These types are designed to be generic and provide type-safe operations. And if you still want an array to back the data, just choose ArrayList<T> as the concrete implementation for a List<T>.

Don’t Use Raw Types

Not specifying the Generic type makes it raw, which effectively means Object, and therefore, not type-safety at compile-time:

java

List raw = new ArrayList();

// We can add anythign to a raw List
raw.add("a String value");
raw.add(42);

Raw types can be cast to a specific Generic type, which might lead to a ClassCastException later on:

java

List<String> strings = raw;  // Unsafe cast
String str = strings(1); // = 42
// => ClassCastException

That’s why we should never use raw types to ensure type-safety at compile-time and prevent any surprises at runtime.

No instanceof for T

Another obvious restriction of Generics, as type erasure doesn’t leave any type information for the runtime-check instanceof that requires a concrete type.

However, there are scenarios where the actual parameterized type isn’t necessary, so we can still check for the raw type:

java

public class NoInstanceOf {

    record Box<T>(T value) { }

    public static void main(String... args) {
        Box<String> box = new Box("Not an Integer");

        testBox(box);
    }

    private static void testBox(Object maybeBox) {
        if (maybeBox instanceof Box box) {
          System.out.println("This is a box:" + box);
        }
    }
}

The Box can be unsafely cast further, and in the case of the record and its automagically generated toString method, the following code runs without exploding:

java

private static void testBox(Object maybeBox) {
    if (maybeBox instanceof Box box) {
        Box<Integer> intBox = box;
        System.out.println("This is a box:" + box);
    }
}

The compiler warns us about the unsafe implicit cast of box, and for a good reason. The toString method might work, but using the value() will trick the compiler to return an Integer that doesn’t exist, resulting in a ClassCastException at runtime.

Confusing Bounded Wildcards

Bounded wildcards are one of those features that are necessary and quite good, but most developers don’t use them in their day-to-day code.

The simplest way to remember the correct usage is the mnemonic PECS: Produces extends, Consumer super

If an instance of a Generic type such as List<? extends Number> is supposed to produce values, like accessing its elements, using extends means it can contain any type of that subclass.

If the instance is supposed to consume data, like adding items to it, a List<? super Integer> allows us to add Integer, Number, or Object instances to it.

Copying from one list to another illustrates PECS in a single method:

java

public <T> void copy(List<? extends T> source,      // PRODUCER
                     List<? super T> destination) { // CONSUMER

    for (T item : source) {
        destination.add(item);
    }
}

My Thoughts on Generics

Generic code allows us to create versatile and reusable code that can operate with any type based on our specified constraints. This minimizes code duplication and conveys its purpose in a more abstract way.

However, Generics are not without their flaws. In Java, Generics are definitely an essential feature. But to be honest, their implementation can sometimes feel convoluted and lacking, especially when compared to other languages.

For developers consuming generic APIs, they are generally straightforward to use and offer substantial functionality. Writing generic APIs, on the other hand, can be more challenging. Personally, I appreciate the idea of creating powerful yet complex tools to write code without imposing the same level of complexity on those who use the code.

Still, Type erasure can be frustrating at times. Conversely, it can also be beneficial to avoid dealing with parameterized types when they aren’t necessary. That was an actual issue I’ve encountered with Swift, where I couldn’t easily fall back to a raw type. Seeing the kinds of problems different Generic approaches have perfectly illustrates that any guarantee on one side may limit flexibility on the other.

Language designers always have to compromise and decide their top priority. And in good Java tradition, backward compatibility was a top priority.

No matter what the designers choose, there’s always controversy on “how to do it right”, especially for features that get added late into a language, like Generics.

Just look at Golang! There was much controversy and extensive debate around how and even if Generics should be introduced to the language. The eventual design was a compromise aimed at providing powerful new capabilities while adhering to Go’s simplicity, performance, and backward compatibility principles.

Resources

Lessons: Generic (Updated) (Java Documentation)
Parameterized Types (JLS §4.5)

#java

Support Me on Ko-fi