Building a Storage Engine in Modern Java with Panama (Part 1)
Append-Only, Memory Mapping, and Why Panama Changes the Rules
Java has often been dismissed for latency-sensitive systems programming in the past. Not because it’s slow, but because it’s unpredictable.
Garbage collection pauses, object-heavy data structures, and decades of “just use C/C++ for this” folklore have taught us one thing: if latency and control matter, Java is most likely the wrong tool.
And the assumption was mostly true unless we dared to dive into the obscure and dangerous sun.misc.Unsafe territory.
But Java 22 changed the game with the finalization of the Foreign Function & Memory (FFM) API (Project Panama).
Java finally got explicit, safe, and deterministic control over off-heap memory with no Unsafe, without JNI, and without tricking the GC behind the scenes.
In this article, we’re going to take that power seriously.
We’ll build a minimal, append-only storage log from scratch using memory-mapped files and Panama’s MemorySegment.
No frameworks.
No abstractions.
Just bytes, layouts, and deliberate control.
Article I (this one):
We’ll discover the Panama fundamentals (Arena,MemorySegment, andVarHandle) to build a structured, memory-mapped append-only log.Article II:
We’ll make our data durable and verifiable with checksums and structured records.Article III:
We’ll eliminate GC pressure by building a custom off-heap hash index, aiming for fast, predictable read/write paths.
This is a learning journey even for myself, as I’m not a systems programmer and only touched native memory a few times before.
Table of Contents
The Silent Killer: GC Pauses and the p99 Problem
We’ve all experienced it at some point in our careers: a service cruises along at optimal latency, doing its job just fine… until occasionally, mysteriously, it doesn’t. A single request out of a hundred suddenly takes 150ms instead of just the usual few ms.
That’s the p99 latency spike: the slowest 1% of requests.
In Java, the villain is usually the Garbage Collector.
Response Time (ms) *
150 | |
| |
| |
100 | |
| |
| * |
50 | | |
| * * | * |
| * | * | * | * * | * | *
0 +----------------------------------------> Time
^ ^ ^
| | |
Normal GC Pause GC Pause
(The p99 Nightmare)A “normal” GC pause is expected at some point at runtime. But these unpredictable p99 spikes destroy user experience and violate SLA guarantees.
Why does the GC cause this pain? One of the main causes is our love for caching data for performance.
Storing key-value associations is often done with a “naïve heap index” like this, a simple HashMap:
Map<String, Long> index = new HashMap<>();At first, that’s absolutely fine, as introducing something like a “distributed caching system” comes with its own can of worms.
But what about HashMap if we hit millions of entries?
Because a million-entry HashMap isn’t “a million things”:
- 1
HashMapinstance - 1M
HashMap$Nodeinstances - 1M
Stringinstances - 1M backing
char[]arrays - 1M
Longinstances - and all the pointer chains between them
That’s millions of objects the GC must trace. And large, long-lived object graphs are toxic to predictable latency. They increase GC work, extend pause times, and make tail latency harder to control.
That’s not a GC tuning problem, it’s a structural one.
Escaping the Heap: The Old Ways vs. The New Way
If the Garbage Collector is the problem, the easiest way to avoid it is to get our data out of its reach.
Of course, we could further optimize our critical code paths, reduce object allocation, or disable the GC altogether. But that’s not why we’re here.
The solution is to manage memory manually, off-heap, meaning native memory where the JVM doesn’t track objects, and the GC won’t lift a finger to clean up.
Keeping hot, large datasets off-heap whenever possible removes GC from the hot path and can significantly improve tail-latency predictability. But we also need to do everything ourselves.
The Old, Dangerous Path: sun.misc.Unsafe
Before Panama, the only way to get high-performance off-heap access was using the “dark arts” of the JDK: sun.misc.Unsafe:
// Obtain Unsafe instance via reflection
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
long address = unsafe.allocateMemory(1_024);
unsafe.putInt(address, 2_048); // no boundary checks
unsafe.freeMemory(address);
int value = unsafe.getInt(address); // Use-after-free error: SEGFAULT!It’s fast and worked, but at the cost of safety, stability, and future compatibility. A single mistake meant a JVM crash instead of an exception.
With JEP 471, these APIs are officially on the way out, as the OpenJDK team has officially deprecated them.
Unsafe is no longer discouraged.
It’s obsolete.
Though it will take a long transition period to wean the ecosystem off it.
The Clunky Path: ByteBuffer
ByteBuffer is a byte-centric API, with no first-class way to model complex layouts, structs, arrays of structs, etc.
It was a step-up over Unsafe, but still a compromise, not a solution.
Memory-mapping a file with FileChannel.map returned a MappedByteBuffer.
When we map a file, the OS makes file contents appear as if they were in RAM. Reading/writing the memory actually reads/writes the file, but the OS handles the I/O transparently.
Unmapping ByteBuffers wasn’t deterministic, as it was the GCs responsibility.
They are a stateful abstraction, making concurrent programming a minefield. Sharing a buffer between threads requires careful, manual synchronization not just of the data, but of the state pointers themselves, a common and frustrating source of bugs.
Furthermore, they were limited to a capacity of Integer.MAX_VALUE, meaning 2GB max if you don’t chunk larger memory blocks into multiple buffers.
The Modern Path: Project Panama’s Foreign Function & Memory API (FFM)
Java 22+ ships the Foreign Function & Memory (FFM) API, a modern, safe, supported replacement for Unsafe/off-heap/JNI gymnastics.
FFM gives us:
- Structured, typed access to off-heap memory
- Safe pointers with bounds checks
- Deterministic lifetimes via
Arena - A clean model for native memory, mapped files, and foreign libraries
And crucially, it gives us actual performance with safety included.
Before diving into mapped files, let’s look at a minimum viable FFM example first.
The FFM Mental Model: Explicit, Scoped, and Boring (In a Good Way)
Project Panama’s FFM API looks intimidating at first, mostly because it forces us to think about memory the way systems code does.
That’s not an accident.
FFM replaces the old “trust me, I know what I’m doing” approach of Unsafe and the awkward statefulness of ByteBuffer with a small set of concepts that are deliberately explicit, creating quite readable code:
// Create off-heap memory arena
try (Arena arena = Arena.ofConfined()) {
// Allocate 16 bytes off-heap
MemorySegment memSeg = arena.allocate(16);
// Set value in a specific layout
memSeg.set(ValueLayout.JAVA_INT, 0, 123);
// Read off-heap memory from arena
int value = memSeg.get(ValueLayout.JAVA_INT, 0);
System.out.println(value); // 123
} // Arena closes, and memory is free automatically -> no memory leaksThis is basically malloc + “safe pointer” + “automatic free when scope ends.”
Arena: Deterministic Lifecycles
An Arena owns memory.
It serves as a lifecycle manager, owning each MemorySegment we allocate, providing deterministic resource management.
No more praying for the GC.
Arena implements AutoCloseable, making the try-with-resources block our new best friend.
As the block dictates the instances’ lifecycle, we can’t use it after it’s freed. This is a powerful, “thread-safe by default” design that eliminates a whole class of data races.
Closing an Arena will invalidate its segments and release the associated memory deterministically, at least from Java’s perspective.
The actual reclamation depends on the OS.
No more forgetting to free memory, bye-bye memory leaks.
We’re going to use Arena.ofConfined().
A confined arena may only be accessed by the thread that created it, removing the need for synchronization, giving the JVM stronger optimization guarantees.
If we want cross-thread access, we need to opt into it explicitly with a shared arena.
MemorySegment: A Safe Pointer Without Hidden State
Whereas the Arena is the lifecycle, the MemorySegment is the data container.
It represents a contiguous region of memory, whether on- or off-heap or memory-mapped from a file.
Crucially, it is stateless.
There is no internal state like, position, no limit, no mutable cursor.
Every access is done via an explicit offset. That might make the code slightly more verbose, but also it’s dramatically easier to reason about it.
Unlike ByteBuffer, a MemorySegment can be larger than 2GB, meaning the addressable space is far beyond the int limit.
It provides bounds checked and can’t be accessed after its originating Arena is closed.
Misusing it, like reading past the end, throws an Exception instead of crashing the JVM.
That’s actually the trade-off Panama makes consistently: fail fast in Java, never segfault the process.
ValueLayout: Interpreting Raw Bytes
Memory itself is just a sequence of 1 and 0, how do we turn them back into something like int or long?
That’s what ValueLayout is for, as it describes the structure of our data.
Instead of methods like getInt() on the buffer, we describe the segment how to read the data:
int value = segment.get(ValueLayout.JAVA_INT, offset);This decouples the memory container (MemorySegment) from the data interpretation (ValueLayout).
Layouts define:
- size (e.g. 4 bytes for an int)
- byte order
- alignment expectations
A Critical Detail: Byte Order (Endianness)
When storing multi-byte values, such as integers, the byte order matters. Different CPU architectures use different conventions (little-endian vs. big-endian).
For cross-platform file formats, we must explicitly specify the byte order:
ValueLayout.JAVA_INT.withOrder(ByteOrder.LITTLE_ENDIAN)That makes the stored data portable and safe to read/write regardless of the underlying CPU architecture.
Another aspect ValueLayout handles is alignment.
CPUs prefer “aligned” data, like an int starting at a memory address that’s divisible by 4.
If it’s not, performance will suffer.
It’s great if we can align data, but for our append-only log, it’s not possible, as it stores records of arbitrary sizes.
Using ValueLayout.JAVA_INT_UNALIGNED tells the JVM that the data isn’t aligned.
What We Get From This Model
By using these abstractions, we get two critical guarantees:
Spatial Safety:
We can never read past the end of a segment. The JVM will still throw an exception if we force it to, but the JVM itself won’t SEGFAULT.Temporal Safety:
We can never read from a segment after itsArenahas been closed.
We get the control of manual memory management without the traditional “footguns.”
This is the mental shift Panama requires.
No more “allocate memory and hope for the best.” We scope memory explicitly, use it, and invalidate it deterministically.
Once we accept that discipline, everything else in the API clicks into place.
Bonus: VarHandle, The Performance Accelerator
For reading and writing simple values, MemorySegment.get(...) and .set(...) work perfectly well.
But for high-frequency loops that access struct-like data repeatedly, Panama provides an advanced tool: VarHandle.
Think of VarHandle as a precompiled, strongly typed accessor for a specific memory layout.
The JVM’s JIT compiler can heavily optimize these operations, reducing them to minimal machine instructions—often just a single MOV instruction on x86.
We won’t need it for our append-only log, but when we build a performance-critical hash index in Article III, VarHandle will be our secret weapon for low-latency struct access.
It’s what transforms “safe” into “safe and fast.”
From Clunky Buffers to Elegant Segments
Memory-mapped files are not new in Java. What’s new is finally having a usable abstraction for them.
The Old Approach: MappedByteBuffer
Historically, mapping a file meant working with MappedByteBuffer.
On the surface, it seems like a straightforward solution. It technically worked, but it came with several structural problems that made it a poor foundation for systems code.
The most serious issue was lifecycle management.
A MappedByteBuffer is unmapped only when it becomes unreachable and the garbage collector decides to clean it up.
There is no supported, deterministic way to release the mapping.
In practice, this meant file handles and virtual memory regions could linger indefinitely, forcing developers to rely on undocumented tricks just to reclaim resources.
The second issue was statefulness.
Every ByteBuffer carries mutable cursor state (position, limit, capacity).
This makes concurrent or shared access fragile, since correctness depends not only on the data, but also on external discipline around buffer state.
Finally, the API itself imposed hard limits.
Capacities were indexed by int, making large mappings awkward and error-prone.
All of this worked against the needs of low-level storage code, where predictability matters more than convenience.
Panama’s MemorySegment fixes these problems by design.
Modern Approach with FFM
A memory-mapped file is just another MemorySegment, owned by an Arena.
When its Arena is closed, the mapping is released immediately and reliably.
The segment itself is stateless. There is no cursor to manage, no hidden mutation.
Every access is explicit, offset-based, and bounds-checked. That makes the code slightly more verbose, but also easier to reason about, especially under concurrency.
Most importantly, the lifecycle of the mapping is now explicit in code. We can see exactly where memory is acquired and exactly where it is released.
With that foundation, we can finally write a minimal append-only log without fighting the runtime:
public class RawLog implements AutoCloseable {
private final Arena arena;
private final MemorySegment mappedSegment;
private long writeOffset;
public RawLog(Path path, long fileSize) throws IOException {
// STEP 1: Create thread-confined Arena.
this.arena = Arena.ofConfined();
// STEP 2: Pre-allocate the file size on disk.
// Make sure it has the required size by writing at the end.
try (FileChannel fc = FileChannel.open(path, CREATE, READ, WRITE)) {
fc.position(fileSize - 1);
fc.write(ByteBuffer.wrap(new byte[]{0}));
// STEP 3: Map the file into memory
this.mappedSegment = fc.map(
FileChannel.MapMode.READ_WRITE,
0,
fileSize,
arena
);
}
this.writeOffset = 0;
}
public void append(byte[] data) {
// STEP 1: Defensive check: Do we have space?
if (writeOffset + data.length > this.mappedSegment.byteSize()) {
throw new IllegalStateException("Log is full.");
}
// STEP 2: Efficient Memory Copy.
// Create a MemorySegment where the data will go
MemorySegment dst = mappedSegment.asSlice(writeOffset, data.length);
// Create a MemorySegment of the data we can copy
MemorySegment src = MemorySegment.ofArray(data);
// Copy data from src into dst
dst.copyFrom(src);
// STEP 3: Update write offset
writeOffset += data.length;
}
@Override
public void close() {
try {
// Request the OS to flush dirty pages to stable storage
this.mappedSegment.force();
}
finally {
// Explicitly release the mapped memory and file handles.
// If we forget to call close(), the Arena logic prevents leaks
// (though in a Confined scope, it expects deterministic closing).
arena.close();
}
}
}This version is intentionally single-threaded and lacks record structure.
The First Failure
We have something that looks promising.
The RawLog class is clean and uses a modern and powerful new Java API.
We can append data directly into a memory-mapped file.
There’s no GC pressure, no object graph, no hidden lifecycle.
Writes are fast, simple, and explicit.
Unfortunately, that’s not enough…
Let’s write a simple main method that appends two distinct messages to our log and then tries to read them:
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) throws Exception {
Path logFile = Files.createTempFile("storage-", ".log");
// Allocate a 1KB log file
try (RawLog log = new RawLog(logFile, 1_024)) {
// Append two separate messages
log.append("Hello".getBytes(StandardCharsets.UTF_8));
log.append("World".getBytes(StandardCharsets.UTF_8));
System.out.println("Appended two messages.");
}
// The Arena is closed here. The OS buffers are flushed (eventually).
// Verification: Read the raw file from disk
byte[] fileContent = Files.readAllBytes(logFile);
String contentAsString = new String(fileContent, StandardCharsets.UTF_8);
// We trim() because the file is padded with null bytes to 1024
System.out.println("Raw file content: '" + contentAsString.trim() + "'");
}
}When we run this code, the output will be:
Appended two messages.
Raw file content: 'HelloWorld'Our two distinct append calls have been smeared together into a single continuous stream of bytes.
What we’ve built so far is a shapeless bag of bytes.
Not only can we not tell where ‘Hello’ ends, but we also have no way of knowing whether the bytes are correct or even fully written to disk.
This demonstrates our first critical flaw and makes our problems ahead more tangible.
The “Uh Oh” Moment: Analysis of Failure and What’s Ahead
Our RawLog is a major step forward, and we successfully wrote data to the disk using high-performance mapped memory!
It’s a powerful foundation that actually stores bytes via FFM, but it’s not a storage system (yet).
At this point, we have something that looks impressive.
A fast append-only log.
Zero GC pressure.
Memory-mapped I/O with explicit lifetimes.
Clean, modern Java.
Unfortunately, it’s also dangerously naïve.
It’s missing three non-negotiable guarantees to be called an actual storage system:
The Framing Problem (No Structure):
To be useful, we need a way to delineate records, a system for framing our data. Without knowing where one record ends the next one starts, we are simply storing gibberish.The Trust Problem (No Integrity):
Currently, corrupted data would be read without any complaint. This is silent corruption, the most dangerous failure mode in any data system. We cannot simply trust our storage medium, we must be able to verify that what we read is exactly what we wrote.The Durability Problem (The Power Cord Test):
When ourappendmethod returns, the data has been copied to the OS page cache. A durable system requires an explicit contract with the OS that says: “Do whatever it takes to get this data onto a non-volatile medium right now.” Without it, our “storage” engine is just a volatile, in-memory cache with a backup plan.
None of these are corner cases. They’re the default failure modes of real systems.
In other words: our storage engine is fast, elegant, and… unfit for reality.
In Part II, we’ll fix that properly.
We’ll design a self-describing on-disk format, add checksums to detect corruption, and make durability explicit using fsync.
Same raw performance, but with structure, integrity checks, and an explicit durability contract.
