Looking at Java 22: Class-File API
Class files and the underlying Bytecode serve as the universal language within the Java ecosystem. Parsing, generating, and transforming class files are essential tasks enabling many of the tools and libraries we use daily.
Today’s JEP 457 is a preview feature that introduces a brand new API for working with Class files and targets to replace the JDK internal copy of ASM eventually.
Table of Contents
Playing Catch-Up
The available libraries for parsing and generating Class files are plentiful, like ASM, BCEL, or Javassist. Each library is tailored to a specific need with its own design goals, advantages over others, and limitations.
Frameworks relying on Class-file generation usually bundle their favorite one. For example, the open-source framework I spend most of my time with, Apache Tapestry, bundles ASM to generate runtime Class files.
Another project you might have heard of that bundles a copy of ASM is the JDK.
It’s used in the javac
compiler and tools like jar
and jlink
.
But using a third-party library in the JDK itself for such an essential task creates the problem of “playing catch up”.
For ASM to support a new JDK version X, it must wait for that version to be finalized before implementing its features.
However, the javac
compiler can’t emit Class files containing those features without being implemented in ASM.
Instead, JDK X features can only be safely emitted with JDK X+1 when ASM has implemented them.
As you can see, that’s a vicious circle that the tools that want to support the latest circles can’t because they aren’t emitted yet, but any new Class-file format features can’t be emitted, as no one supports them… And with the tightened release cadence of six months, the class-file format might evolve quite quickly, forcing third-party libraries to play catch-up fast, or their users might be left behind until the new version drops.
That’s the issue JEP 457 tries to solve by providing an API for processing class files that evolves alongside the JDK itself, making new Class-file format features available directly at release.
The New Class-File API
The Class-file API is located in the newly added java.lang.classfile
package.
It’s defined by three main abstractions: elements, builders, and transforms.
Elements
An element describes some part of a class file. It’s immutable and can be as small as a single instruction or as large as the whole Class file, and everything in between, such as an attribute, a field, or a method.
Elements can further contain elements of their own, making them compound elements, like methods or classes.
Builders
For every type of compound element, there’s a matching builder equipped with specific building methods, such as ClassBuilder::withMethod
.
A builder also serves as a Consumer
Transforms
A transform is a function accepting an element and builder, guiding the process of how, or if, that element should be transformed into other elements.
Parsing Class Files with Patterns
Whereas ASM uses the visitor pattern to create its view of Class files, the new Class-File API goes a different way, relying on a feature introduced in Java 17 and finalized in 21: switch
Pattern Matching.
The main type for reading Class files is ClassModel
.
It’s an immutable description of the Class file with accessor methods for metadata and things like fields, attributes, etc., which are lazily inflated, meaning most parts of a Class file aren’t parsed if not needed.
To create such a model, we can convert bytes into an instance:
We can traverse the specific elements with loop-structures:
However, the model being a series of Class elements, we can also utilize pattern matching for a more organized way of doing things:
A ClassElement
, like a MethodModel
can be the source of more elements.
For example, to find all Classes which fields and methods the initial ClassModel
methods access, we need to go deeper:
Having to go 5 levels deep is quite unwieldy… let’s convert it more readable Stream<ClassElement>
pipeline:
With a Stream pipeline, the different steps could be refactored to methods, so method references would clean up the code even more.
Generating a Class File
Parsing Class files is only one part of the equation. Another is generating new Class files.
Let’s say we want to generate a class containing the following simple method:
First, it ASM’s and its visitor-based approach turn:
To better understand what’s going on, here’s a mapping between the Java code and the resulting Bytecode:
Next, let’s take a look at how the new Class-file API deals with creating the method. Once again, the API departs from the visitor pattern and uses a lambda-accepting builder instead:
That’s way more straightforward and explicit.
Instead of passing instructions via visitVarInsn
, the builder has many convenient methods like aload
or iload
directly built-in.
The only one I really dislike is return_
with an underscore, but well, it’s a reserved keyword.
Decoupling the builder from the visitor pattern also creates another opportunity: higher-level block scoping and local-variable index calculation.
Since the Class-file API handles block scoping for us, we no longer need to create labels or branch instructions manually, making the previous example actually readable for people not fluent in Bytecode:
Labels and branches are still in the resulting Bytecode, as they are essential, but it’s all done automatically on our behalf! This results in more straightforward and maintainable code.
Transforming Class Files
Class-file libraries are often used to combine the steps of reading and writing into transformation; a Class is read and localized changes applied, but most of it passes through unchanged.
That’s why each builder has with...
methods so that elements can just pass through without any transformation.
For example, let’s remove all methods from a Class whose names start with “debug” using a for
loop:
For simple transformations, this seems fine. However, navigating the Class-tree and examining each element creates a lot of repetitive boilerplate, and we end up a few levels deep in nested code, like in the next example.
The following code swaps the invocations of methods on Class Foo
with another Class called Bar
.
Formatting is a little off, but I wanted to have a chance to fit in the code box:
To untangle such complex code, we can use a lambda-based approach of transforms, just like before.
Transforms are functional interfaces and named after the element they affect (e.g., ClassTransform
).
They accept a builder and an element, so they can either replace the element, drop it, or pass it through to the builder.
This way, the previous deeply nested example becomes more tangible:
All for
-loops are gone, removing a lot of boilerplate.
Still, there’s still a lot of nesting going on.
To simplify the code further, the instruction transformation can be refactored to a CodeTransform
:
Then, the CodeTransform
, a transform on code elements, is lifted into one on method elements:
The idea here is that the transformation is only applied to the related elements.
In this case, the MethodTransform
will only transform any Code, as it was lifted from a CodeTransform
.
We don’t even have to stop here!
The MethodTransform
can be lifted again into a transform for Class elements:
After all that lifting, the previous example becomes quite straightforward:
Now that’s on improvement over the for
-loop and even the initial lambda-based variant.
Transforms being lambdas enable another great functional feature: Composition.
This way, we can build a series of simple transformations for common use-cases and compose them as needed.
Design Philosophy of the Class-File API
The new API breaks away from a lot of things we’re used to with other Class-manipulation libraries. It might seems unfamiliar at first, but it’s all aligned to the design goals and principles the OpenJDK team has defined for the new API:
Immutable Elements:
To open the door for reliable and safe sharing, all Class-file elements (e.g, fields, methods, instructions, …) are represented by immutable objects.Tree Structure:
Class files are represented by a tree of elements, which might have elements of their own.User-driven Navigation & Laziness:
Navigating the tree is decided by where we, as the user, choose to go. Therefore, only the essential parts of the tree should be parsed to satisfy the current navigation requirements.Unified Streaming & Materialized Views:
The new API supports a streaming and materialized representation of a Class file, just like ASM. Thanks to immutability and laziness, though, the materialized view is less expensive than ASM.Emergent Transformation:
With Class-file parsing and generation APIs being aligned, then transformation does not require its own special mode and becomes an emergent property of the API.Detail Hiding:
Class files consist of many parts, like constant pool, bootstrap methods, etc., and it seems nonsensical to construct these manually all the time. So instead, the API will do the heavy lifting for us.Modern Language Features:
ASM was released 22 years ago, in 2002. Since then, Java has evolved significantly, including lambdas, records, select classes, pattern matching, etc., and the new API is using them all to create a more flexible, pleasant, and modern experience.
Conclusion
With Class-file manipulation and Bytecode generation, there have always been calls for an “official” library for multiple reasons, such as having a “guaranteed always up-to-date” library. And the faster release cadence of the six-month release cadence created two more reasons to consider.
First, we encounter newer Class-file formats more often, as they might change every six months.
Second, Java evolves faster than ever, so there definitely will be more format changes in shorter timeframes than before.
Thankfully, instead of trying to just “standardize” an existing solution like ASM, the OpenJDK team decided to design a modern and versatile API from the ground up.
Don’t misunderstand me here; there’s nothing wrong with standardizing an existing library for the JDK. It worked great for JSR-310 and Joda Time. However, trying to standardize a design from 22 years ago might not be the best solution going forward, and a more modern approach, even if unfamiliar at first, will be the better solution in the long run.
One thing that needs to be mentioned is that the new API does not intend to replace ASM or any other Class-file library by creating “one bytecode library to rule them all”. Each of them has their own goals, pros, and cons. Over time, though, I’m sure it will take a big bite out of the existing library share.
But looking at the ASM-related code in my projects, replacing it would be a herculean effort and will take quite some time, if it even makes sense at all. Raising the minimum Java version would also be required, which isn’t simple.
Nevertheless, JEP 457 is another welcome addition to the JDK, even though it’s still in preview. It solves the chicken-egg problem of Class-file format changes and gives us much more. The design goals and principles behind it create a powerful, easy-to-use and less error-prone way working with Class files.
Resources
Looking at Java 22
- Intro
- Statements before super (JEP 447)
- Unnamed Variables & Patterns (JEP 456)
- Foreign Function & Memory API
- [Stream Gatherers (JEP 461)]({{ ref “posts/2024/2024-04-29-looking-at-java-22-stream-gatherers.md” }})