Understanding Java Generics: Key Features and Common Pitfalls
Generics are an indispensable feature in the world of Java programming, shaping how we write and interact with code. Despite their ubiquity and importance, many developers only scratch the surface of what Generics can do.
While they provide powerful tools for creating flexible and reusable code, the intricacies and subtleties often remain a mystery to many, myself included.
That’s why I wrote this article to demystify Generics and offer an overview of their features and common pitfalls so we can confidently harness their full potential.
Table of Contents
Why Generics?
Java Generics were introduced in Java 5 in 2004, marking a significant evolution of the language that enhanced type safety and enabled more robust and reusable code.
Before Generics, non-specialized collection and container types simply stored instances of Object
, requiring explicit casting and type checking, making it cumbersome but also error-prone.
When everything is Object
, the compiler won’t help out, and one wrong casting decision and boom! ClassCastException
.
The following non-Generic, raw code compiles fine (except for the compiler nagging about the unchecked/unsafe operations):
As you might have already suspected, the code explodes when run, as the String
representation of 42 is not an Integer
, generating the following output:
This little example already perfectly illustrates the motivation behind Generics: Type safety.
The primary purpose of Generics is to give classes, interfaces, and methods the possibility to parameterize themselves with the permitted types to be used with them. This explicit definition of types provides compile-time safety, as the compiler now has insight into what you actually want to do.
By enforcing type constraints at compile-time, Generics eliminate the need for explicit type casting and checking, effectively moving many type-related runtime crashes to compile-time. This not only reduces runtime errors and makes your code more reliable, as problems are caught earlier during compile-time, but your code will also be more expressive, as types communicate more clearly what types they represent.
Generics also aimed at improving code reusability and maintainability.
By defining generic classes and methods, developers can write more flexible and reusable code that works with any type.
For example, we could’ve created an IntegerList
or a StringList
to have a type-safe contract that’s validated at compile-time.
But then, what about the many other types we might want to store in a List
?
We’d either duplicate a lot of code or create elaborate type hierarchies to share code between them.
Instead, the Generic List<T>
is a singular interface that can handle any singular type for its content, and the compiler will make sure we use it as intended without needing to cast anything.
There are a few limitations on how the compiler checks the related types and the implementation is not 100% generic. More on that later in the article!
Thanks to generics, only a single interface is needed without duplicating the implementation for each type. This not only reduces code duplication but also makes the code easier to understand and maintain.
Overall, the motivation behind Java Generics was to provide a more powerful, expressive, and type-safe language construct that addresses the limitations of pre-generics Java. By allowing types to be parameters, Generics bring the benefits of polymorphism and abstraction to a new level, enabling developers to write cleaner, more efficient, and less error-prone code.
Type Erasure
One thing you might already be familiar with, or caught in the previous paragraphs, is the focus on “compile-time”.
That’s because Generic types are erased during compilation, effectively converting any generic type into their raw types.
The previous List<Integer>
becomes effectively a List
at runtime, and all its methods work on Object
instead of Integer
.
You might ask, “why introduce a feature to increase type-safety and then remove that safety runtime?” It’s not that this is an impossible feature, as there are many languages that retain Generic type information at runtime:
- C# uses the concept of reification for its Generics, whereas Java only uses it for non-Generic types (JLS §4.7).
- C++ templates generate separate copies for each type used.
- Kotlin has the keyword
reified
at least for inline functions.
So, why did Java decide to use type erasure?
Well, it’s because of one of Java’s best (and sometimes worst) aspects: backward compatibility.
By replacing all Generic types with either Object
or their specified bound, the generated Bytecode is compatible with any code and runtimes that existed before.
Nowadays, we don’t think much about it, as everything uses Generics, and most likely, all the code we’re using has evolved since Java 5. But in 2004, the Java world looked differently. There already was a considerable ecosystem before the introduction of Generics, so the Java team of the past made sure that a seamless transition to Generics was possible.
The Bytecode generated from Generic and non-Generic code is uniform, meaning it follows the same structure and type-information. This uniformity guarantees that the JVM can execute both without any special handling or changes to the JVM itself.
Of course, there are downsides to choosing type erasure, too. Primarily the lack of types at runtime. However, as with any other feature in a programming language, it was a design compromise. The language team at Sun chose to make interoperability possible and allow for gradual adoption without breaking the ecosystem.
Types of Generics in Java
Generics are available for classes, methods, and interfaces. Each of them serves a specific purpose and has a particular usage pattern.
Records, introduced in Java 14, are also a valid target for Generic design.
Generic Classes
A Generic class can operate on any of the specified types. This is particularly useful for creating collections and data structures that can handle or hold various object types without sacrificing type safety.
The syntax to specify the Generic type is putting it between <>
(angle brackets) directly behind the type name.
For a single Generic type, usually
T
is used. Collections useE
for “element”, orK
andV
for maps. Return types or often calledR
. However, any identifier will suffice, but not to clash with any existing type name, single uppercase letters are preferred.
The parameterized type is available throughout the class and can be used for fields, method return types, or method arguments:
This Box<T>
type can hold any singular type.
Without any further type bound, T
will be treated as Object
if you want to call any method on the field or argument value
.
Only when used in an instance and T
gets specified the Box
will magically become type-safe:
The particular instance of Box<T>
can only hold CustomDataStructure
.
No need for casting anything; code-completion will infer the correct type, and the compilation will fail if you use the wrong kind.
Records behave like classes, where the Generic type can be either used for a component in the canonical constructor, as method arguments, or as a return type:
Even though Enums are a special kind of class like Records under the hood, they can’t use Generics for their type declaration, only as concrete implementation of a Generic interface, or using a Generic method.
Generic Methods
Like classes having Generic type parameters, methods can have their own, which are independent of any class-level ones, if any. This creates more flexible and reusable methods that can operate on multiple types.
Similar to classes, the Generic type information is placed in <>
.
This time, however, it’s placed before the return type:
The print
method simply iterates over a collection and prints the elements to System.out
. As T
is effectively Object
, the toString
method is called.
There’s no need for actual types, allowing the method to print any type of Collection<E>
without relying on raw types.
Generic Interfaces
Maybe the most common use for Generics are interfaces, as they give us “generic” blueprints for types that are flexible and versatile.
Let’s talk about the java.util.function.Function
interface.
It represents a function that accepts a single value and returns another one.
Any type of value is accepted, and any type might get returned.
Without Generics, there are only two ways to do that:
- Use
Object
for everything - Add a variant for each combination of types
The first option isn’t type-safe and not much fun to use.
The latter is feasible if we want to support a few specific, non-general types, but will most likely lead to code duplication, especially if we want to support more types.
And Function
doesn’t actually care about the actual types or does anything with them except provide the scaffold for creating lambdas and ensuring type safety.
So, let’s use Generics!
This simplified representation of Function<T, R>
has everything we need to know about it: a function is applied to value T, which creates a value R.
Now we have a generic template for any Function
we ever use.
Using the Generic type isn’t restricted to abstract method declarations.
We can also use them for default
and static
methods.
This is how Function<T, R>
supports functional composition and identity function:
Bounded Type Parameters
You might have spotted the <? super V>
or <? extends T>
in the previous chapter, which was used without any further explanation.
These are bounded type parameters, which constrain Generic types. These create an upper or lower bound of acceptable types to replace the Generic parameter.
Upper Bounded Wildcards (extends)
The extends
keyword creates an upper bound, meaning the Generic type parameter must be a subclass of that type or implement that particular interface.
For example, what if we want a Box<T>
but only want to allow numbers?
The boxed types of the JDK are all descendants of the Number
type, so we can add it as an upper bound:
At this point, NumberBox<T>
is just a more limited version of Box<T>
.
The real power a bounded type gives you is making T
no longer an Object
but the actual one: Number
:
A NumberBox<T>
could hold an Integer
or Double
, but type constraint allows you to create a specific implementation based on the shared ancestor.
Lower Bounded Wildcards (super)
Where an upper bound creates a type ceiling with Object
as the highest option, a lower bound works the other way around.
Let’s say we have a method that only works on a List
containing an Integer
or one of its parent types.
That means it should only accept Integer
, Number
, or Object
, but not any of the other Number
types like with the previous upper bounded wildcard.
That’s where the keyword super
comes in:
The Generic type of List<E>
is now constraint to any type (represented by ?
) that’s either an Integer
or above in its type hierarchy:
The List<Double>
code won’t compile, as Double
has no direct relationship with Integer
, just the shared higher-up Number
.
However, the lower bound specifies Integer
, so the compiler enforces the constraint.
Unbound Wildcards (?)
The last type of wildcard is the unbounded variety.
We had the previous example of a Generic print
method that didn’t need any type-information:
The reasoning for making it Generic was not to rely on a raw type and being able to accept any type of collection.
The actual E
never mattered, and an Object
would suffice.
This could be achieved with a higher bound on Collection
:
It works, but it’s a verbose way of saying that any type is allowed, and we don’t actually care about it.
That’s why there’s a shorter alternative: the unbounded wildcard ?
(question mark):
It’s a helpful feature for creating reusable and versatile code without concern for the specific Generic type. It keeps our code more straightforward and the compiler happy.
However, there is a downside to unbounded wildcards.
As they represent any type, it’s an unspecific type and won’t become more specific when used.
No matter what kind of Collection<E>
you pass to print
, the argument collection
will remain a Collection<?>
, making certain operations impossible, such as adding to it:
The unwieldy compiler error shows that <?>
isn’t just shorthand for ? extends Object>
but actually represents “any non-specific type”, enforcing a read-only nature.
Intersection Types
A rather not well-known feature that’s quite helpful with Generic bounds is intersection types. They are a way to express that a type parameter must simultaneously satisfy multiple type constraints.
So far, we looked at how to constrain a parameterized type with a single bounded type. But sometimes, it makes a lot of sense to enforce the conformity to multiply types, like common-use interfaces.
Take Serializable
, for example.
It’s a marker interface that a lot of classes in the JDK, and most likely also your code, adhere to.
Using it as an upper bound wouldn’t constrain the Generic type as narrow as we might need.
Adding an intersection type with the help of &
(ampersand) will help to create an actual viable constraint:
This method only accepts a Collection
of a type that is both a Runnable
and Serializable
.
Either type alone is quite open, but in combination, the bound has a specific meaning for the task at hand.
Even Generic intersection types are possible, like adding Comparable<T>
into the mix:
This creates quite specific bounds and gives us access to methods of all the types without having to restrict the method to a concrete type implementing them.
Generics and Reflection
Reflection is another essential feature of Java. One of its capabilities is accessing type information at runtime. But how well does that work with type-erased Generics?
If a method returns a List<T>
, how can we identify the concrete type of T
at runtime?
Well, even with the new methods available on java.lang.reflect.Method
you won’t get what you expect:
The code above will output Type: T
We already know that, and it’s not very helpful… But this is kind of expected.
Java doesn’t generate an explicit class for every use of a Generic one, so there’s just a singular Box.class
without any type information (besides T
).
If we need the actual type at runtime, there are workarounds/hacks, though.
Type Tokens
Type tokens are a hack to create a token that knows its Generic type. It’s an abstract class that stores the type information in its constructor, but we need to create an ad-hoc implementation to create a new class:
Now, we can store a Generic type in a token:
This simple implementation should illustrate the general concept of type token.
Even though we finally get the used Generic type as java.util.List<String>
, the Type
interface has little to no additional information for us to work with.
We need a much more complex implementation for getting actual useful type information. If you want to know more about how it’s done, I recommend checking out how Guava does it.
Storing the actual Class
If the compiler won’t store the actual type information for us, we can always do it ourselves:
It’s clunky and verbose, but it works. The compiler even ensures that we use the correct class for the argument. Still, I wouldn’t recommend it as an everyday tool on your tool belt. But it’s nice knowing you have it at your disposal when you need it.
Common Pitfalls and Gotchas
There are a few more things to look out for with Generics besides the general lack of types during runtime.
No Primitive as Generic Types (yet)
Only object types are valid for use as a Generic type, so no int
, bool
, etc.
However, we can use the boxed variants Integer
, Boolean
instead, if the overhead of auto-boxing is an acceptable trade-off for the desired context.
Be aware that you create a new issue: making null
a valid value for the type, which doesn’t exist for primitives.
Currently, the OpenJDK Project Valhalla tries to remedy the lack of primitive Generics, among other things. So we will eventually get them in a future Java version.
Avoid Generic Arrays
There are several issues with Generic arrays in Java.
There’s no way to instantiate them directly:
The only option is to create an Object[]
and do an unsafe cast to T[]
:
Even with the then generic array, we have potential ClassCastException
coming our way thanks to Java arrays being covariant.
That means it can be assigned to a super-class type.
For example, an ArrayBox<Integer>
inner array can be assigned Number[]
; no issues here, as Integer
is a subtype of Number
.
If
T
is a subtype ofU
, thenT[]
is also a subtype ofU[]
But when we try to set a value valid for the Number[]
which isn’t compatible to the Generic way of doing things, leading to runtime exceptions if the wrong type gets inserted after reassignment:
The same approach of down-casting to a common ancestor isn’t possible with non-arrays.
For example, replacing the field T[] arr
with a List<T>
would lead to the compiler preventing you from trying to assign a List<Integer>
to a List<Number>
, as the types aren’t covariant.
The easiest way around the array issues is to use one of the Collection types like List<T>
, Set<T>
, or Map<K, V>
instead.
These types are designed to be generic and provide type-safe operations.
And if you still want an array to back the data, just choose ArrayList<T>
as the concrete implementation for a List<T>
.
Don’t Use Raw Types
Not specifying the Generic type makes it raw, which effectively means Object
, and therefore, not type-safety at compile-time:
Raw types can be cast to a specific Generic type, which might lead to a ClassCastException
later on:
That’s why we should never use raw types to ensure type-safety at compile-time and prevent any surprises at runtime.
No instanceof for T
Another obvious restriction of Generics, as type erasure doesn’t leave any type information for the runtime-check instanceof
that requires a concrete type.
However, there are scenarios where the actual parameterized type isn’t necessary, so we can still check for the raw type:
The Box
can be unsafely cast further, and in the case of the record and its automagically generated toString
method, the following code runs without exploding:
The compiler warns us about the unsafe implicit cast of box
, and for a good reason.
The toString
method might work, but using the value()
will trick the compiler to return an Integer
that doesn’t exist, resulting in a ClassCastException
at runtime.
Confusing Bounded Wildcards
Bounded wildcards are one of those features that are necessary and quite good, but most developers don’t use them in their day-to-day code.
The simplest way to remember the correct usage is the mnemonic PECS: Produces extends
, Consumer super
If an instance of a Generic type such as List<? extends Number>
is supposed to produce values, like accessing its elements, using extends
means it can contain any type of that subclass.
If the instance is supposed to consume data, like adding items to it, a List<? super Integer>
allows us to add Integer
, Number
, or Object
instances to it.
Copying from one list to another illustrates PECS in a single method:
My Thoughts on Generics
Generic code allows us to create versatile and reusable code that can operate with any type based on our specified constraints. This minimizes code duplication and conveys its purpose in a more abstract way.
However, Generics are not without their flaws. In Java, Generics are definitely an essential feature. But to be honest, their implementation can sometimes feel convoluted and lacking, especially when compared to other languages.
For developers consuming generic APIs, they are generally straightforward to use and offer substantial functionality. Writing generic APIs, on the other hand, can be more challenging. Personally, I appreciate the idea of creating powerful yet complex tools to write code without imposing the same level of complexity on those who use the code.
Still, Type erasure can be frustrating at times. Conversely, it can also be beneficial to avoid dealing with parameterized types when they aren’t necessary. That was an actual issue I’ve encountered with Swift, where I couldn’t easily fall back to a raw type. Seeing the kinds of problems different Generic approaches have perfectly illustrates that any guarantee on one side may limit flexibility on the other.
Language designers always have to compromise and decide their top priority. And in good Java tradition, backward compatibility was a top priority.
No matter what the designers choose, there’s always controversy on “how to do it right”, especially for features that get added late into a language, like Generics.
Just look at Golang! There was much controversy and extensive debate around how and even if Generics should be introduced to the language. The eventual design was a compromise aimed at providing powerful new capabilities while adhering to Go’s simplicity, performance, and backward compatibility principles.
Resources
- Lessons: Generic (Updated) (Java Documentation)
- Parameterized Types (JLS §4.5)