Building a Storage Engine in Modern Java with Panama (Part 1)

Append-Only, Memory Mapping, and Why Panama Changes the Rules

 · 15 min

Java has often been dismissed for latency-sensitive systems programming in the past. Not because it’s slow, but because it’s unpredictable.

Garbage collection pauses, object-heavy data structures, and decades of “just use C/C++ for this” folklore have taught us one thing: if latency and control matter, Java is most likely the wrong tool. And the assumption was mostly true unless we dared to dive into the obscure and dangerous sun.misc.Unsafe territory.

But Java 22 changed the game with the finalization of the Foreign Function & Memory (FFM) API (Project Panama). Java finally got explicit, safe, and deterministic control over off-heap memory with no Unsafe, without JNI, and without tricking the GC behind the scenes.

In this article, we’re going to take that power seriously.

We’ll build a minimal, append-only storage log from scratch using memory-mapped files and Panama’s MemorySegment.
No frameworks.
No abstractions.
Just bytes, layouts, and deliberate control.

  • Article I (this one):
    We’ll discover the Panama fundamentals (Arena, MemorySegment, and VarHandle) to build a structured, memory-mapped append-only log.

  • Article II:
    We’ll make our data durable and verifiable with checksums and structured records.

  • Article III:
    We’ll eliminate GC pressure by building a custom off-heap hash index, aiming for fast, predictable read/write paths.

This is a learning journey even for myself, as I’m not a systems programmer and only touched native memory a few times before.


The Silent Killer: GC Pauses and the p99 Problem

We’ve all experienced it at some point in our careers: a service cruises along at optimal latency, doing its job just fine… until occasionally, mysteriously, it doesn’t. A single request out of a hundred suddenly takes 150ms instead of just the usual few ms.

That’s the p99 latency spike: the slowest 1% of requests.

In Java, the villain is usually the Garbage Collector.

text
Response Time (ms)                    *
150 |                                 |
    |                                 |
    |                                 |
100 |                                 |
    |                                 |
    |                 *               |
 50 |                 |               |
    |     *     *     |         *     |
    |  *  |  *  |  *  |  *   *  |  *  |  *
  0 +----------------------------------------> Time
      ^               ^               ^
      |               |               |
    Normal         GC Pause        GC Pause
                               (The p99 Nightmare)

A “normal” GC pause is expected at some point at runtime. But these unpredictable p99 spikes destroy user experience and violate SLA guarantees.

Why does the GC cause this pain? One of the main causes is our love for caching data for performance.

Storing key-value associations is often done with a “naïve heap index” like this, a simple HashMap:

java
Map<String, Long> index = new HashMap<>();

At first, that’s absolutely fine, as introducing something like a “distributed caching system” comes with its own can of worms.

But what about HashMap if we hit millions of entries?

Because a million-entry HashMap isn’t “a million things”:

  • 1 HashMap instance
  • 1M HashMap$Node instances
  • 1M String instances
  • 1M backing char[] arrays
  • 1M Long instances
  • and all the pointer chains between them

That’s millions of objects the GC must trace. And large, long-lived object graphs are toxic to predictable latency. They increase GC work, extend pause times, and make tail latency harder to control.

That’s not a GC tuning problem, it’s a structural one.


Escaping the Heap: The Old Ways vs. The New Way

If the Garbage Collector is the problem, the easiest way to avoid it is to get our data out of its reach.

Of course, we could further optimize our critical code paths, reduce object allocation, or disable the GC altogether. But that’s not why we’re here.

The solution is to manage memory manually, off-heap, meaning native memory where the JVM doesn’t track objects, and the GC won’t lift a finger to clean up.

Keeping hot, large datasets off-heap whenever possible removes GC from the hot path and can significantly improve tail-latency predictability. But we also need to do everything ourselves.

The Old, Dangerous Path: sun.misc.Unsafe

Before Panama, the only way to get high-performance off-heap access was using the “dark arts” of the JDK: sun.misc.Unsafe:

java
// Obtain Unsafe instance via reflection
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);

long address = unsafe.allocateMemory(1_024);
unsafe.putInt(address, 2_048); // no boundary checks
unsafe.freeMemory(address);
int value = unsafe.getInt(address); // Use-after-free error: SEGFAULT!

It’s fast and worked, but at the cost of safety, stability, and future compatibility. A single mistake meant a JVM crash instead of an exception.

With JEP 471, these APIs are officially on the way out, as the OpenJDK team has officially deprecated them.

Unsafe is no longer discouraged.
It’s obsolete. Though it will take a long transition period to wean the ecosystem off it.

The Clunky Path: ByteBuffer

ByteBuffer is a byte-centric API, with no first-class way to model complex layouts, structs, arrays of structs, etc. It was a step-up over Unsafe, but still a compromise, not a solution.

Memory-mapping a file with FileChannel.map returned a MappedByteBuffer.

When we map a file, the OS makes file contents appear as if they were in RAM. Reading/writing the memory actually reads/writes the file, but the OS handles the I/O transparently.

Unmapping ByteBuffers wasn’t deterministic, as it was the GCs responsibility.

They are a stateful abstraction, making concurrent programming a minefield. Sharing a buffer between threads requires careful, manual synchronization not just of the data, but of the state pointers themselves, a common and frustrating source of bugs.

Furthermore, they were limited to a capacity of Integer.MAX_VALUE, meaning 2GB max if you don’t chunk larger memory blocks into multiple buffers.

The Modern Path: Project Panama’s Foreign Function & Memory API (FFM)

Java 22+ ships the Foreign Function & Memory (FFM) API, a modern, safe, supported replacement for Unsafe/off-heap/JNI gymnastics.

FFM gives us:

  • Structured, typed access to off-heap memory
  • Safe pointers with bounds checks
  • Deterministic lifetimes via Arena
  • A clean model for native memory, mapped files, and foreign libraries

And crucially, it gives us actual performance with safety included.

Before diving into mapped files, let’s look at a minimum viable FFM example first.


The FFM Mental Model: Explicit, Scoped, and Boring (In a Good Way)

Project Panama’s FFM API looks intimidating at first, mostly because it forces us to think about memory the way systems code does.

That’s not an accident.

FFM replaces the old “trust me, I know what I’m doing” approach of Unsafe and the awkward statefulness of ByteBuffer with a small set of concepts that are deliberately explicit, creating quite readable code:

java
// Create off-heap memory arena
try (Arena arena = Arena.ofConfined()) {
    // Allocate 16 bytes off-heap
    MemorySegment memSeg = arena.allocate(16);

    // Set value in a specific layout
    memSeg.set(ValueLayout.JAVA_INT, 0, 123);

    // Read off-heap memory from arena
    int value = memSeg.get(ValueLayout.JAVA_INT, 0);
    System.out.println(value); // 123
} // Arena closes, and memory is free automatically -> no memory leaks

This is basically malloc + “safe pointer” + “automatic free when scope ends.”

Arena: Deterministic Lifecycles

An Arena owns memory.

It serves as a lifecycle manager, owning each MemorySegment we allocate, providing deterministic resource management.

No more praying for the GC.

Arena implements AutoCloseable, making the try-with-resources block our new best friend.

As the block dictates the instances’ lifecycle, we can’t use it after it’s freed. This is a powerful, “thread-safe by default” design that eliminates a whole class of data races.

Closing an Arena will invalidate its segments and release the associated memory deterministically, at least from Java’s perspective. The actual reclamation depends on the OS.

No more forgetting to free memory, bye-bye memory leaks.

We’re going to use Arena.ofConfined().

A confined arena may only be accessed by the thread that created it, removing the need for synchronization, giving the JVM stronger optimization guarantees.

If we want cross-thread access, we need to opt into it explicitly with a shared arena.

MemorySegment: A Safe Pointer Without Hidden State

Whereas the Arena is the lifecycle, the MemorySegment is the data container. It represents a contiguous region of memory, whether on- or off-heap or memory-mapped from a file.

Crucially, it is stateless. There is no internal state like, position, no limit, no mutable cursor.

Every access is done via an explicit offset. That might make the code slightly more verbose, but also it’s dramatically easier to reason about it.

Unlike ByteBuffer, a MemorySegment can be larger than 2GB, meaning the addressable space is far beyond the int limit. It provides bounds checked and can’t be accessed after its originating Arena is closed.

Misusing it, like reading past the end, throws an Exception instead of crashing the JVM.

That’s actually the trade-off Panama makes consistently: fail fast in Java, never segfault the process.

ValueLayout: Interpreting Raw Bytes

Memory itself is just a sequence of 1 and 0, how do we turn them back into something like int or long? That’s what ValueLayout is for, as it describes the structure of our data.

Instead of methods like getInt() on the buffer, we describe the segment how to read the data:

java
int value = segment.get(ValueLayout.JAVA_INT, offset);

This decouples the memory container (MemorySegment) from the data interpretation (ValueLayout).

Layouts define:

  • size (e.g. 4 bytes for an int)
  • byte order
  • alignment expectations

A Critical Detail: Byte Order (Endianness)

When storing multi-byte values, such as integers, the byte order matters. Different CPU architectures use different conventions (little-endian vs. big-endian).

For cross-platform file formats, we must explicitly specify the byte order:

java
ValueLayout.JAVA_INT.withOrder(ByteOrder.LITTLE_ENDIAN)

That makes the stored data portable and safe to read/write regardless of the underlying CPU architecture.

Another aspect ValueLayout handles is alignment. CPUs prefer “aligned” data, like an int starting at a memory address that’s divisible by 4. If it’s not, performance will suffer. It’s great if we can align data, but for our append-only log, it’s not possible, as it stores records of arbitrary sizes.

Using ValueLayout.JAVA_INT_UNALIGNED tells the JVM that the data isn’t aligned.

What We Get From This Model

By using these abstractions, we get two critical guarantees:

  • Spatial Safety:
    We can never read past the end of a segment. The JVM will still throw an exception if we force it to, but the JVM itself won’t SEGFAULT.

  • Temporal Safety:
    We can never read from a segment after its Arena has been closed.

We get the control of manual memory management without the traditional “footguns.”

This is the mental shift Panama requires.

No more “allocate memory and hope for the best.” We scope memory explicitly, use it, and invalidate it deterministically.

Once we accept that discipline, everything else in the API clicks into place.

Bonus: VarHandle, The Performance Accelerator

For reading and writing simple values, MemorySegment.get(...) and .set(...) work perfectly well. But for high-frequency loops that access struct-like data repeatedly, Panama provides an advanced tool: VarHandle.

Think of VarHandle as a precompiled, strongly typed accessor for a specific memory layout. The JVM’s JIT compiler can heavily optimize these operations, reducing them to minimal machine instructions—often just a single MOV instruction on x86.

We won’t need it for our append-only log, but when we build a performance-critical hash index in Article III, VarHandle will be our secret weapon for low-latency struct access. It’s what transforms “safe” into “safe and fast.”


From Clunky Buffers to Elegant Segments

Memory-mapped files are not new in Java. What’s new is finally having a usable abstraction for them.

The Old Approach: MappedByteBuffer

Historically, mapping a file meant working with MappedByteBuffer.

On the surface, it seems like a straightforward solution. It technically worked, but it came with several structural problems that made it a poor foundation for systems code.

The most serious issue was lifecycle management.

A MappedByteBuffer is unmapped only when it becomes unreachable and the garbage collector decides to clean it up. There is no supported, deterministic way to release the mapping.

In practice, this meant file handles and virtual memory regions could linger indefinitely, forcing developers to rely on undocumented tricks just to reclaim resources.

The second issue was statefulness.

Every ByteBuffer carries mutable cursor state (position, limit, capacity). This makes concurrent or shared access fragile, since correctness depends not only on the data, but also on external discipline around buffer state.

Finally, the API itself imposed hard limits. Capacities were indexed by int, making large mappings awkward and error-prone.

All of this worked against the needs of low-level storage code, where predictability matters more than convenience.

Panama’s MemorySegment fixes these problems by design.

Modern Approach with FFM

A memory-mapped file is just another MemorySegment, owned by an Arena. When its Arena is closed, the mapping is released immediately and reliably.

The segment itself is stateless. There is no cursor to manage, no hidden mutation.

Every access is explicit, offset-based, and bounds-checked. That makes the code slightly more verbose, but also easier to reason about, especially under concurrency.

Most importantly, the lifecycle of the mapping is now explicit in code. We can see exactly where memory is acquired and exactly where it is released.

With that foundation, we can finally write a minimal append-only log without fighting the runtime:

java
public class RawLog implements AutoCloseable {

  private final Arena         arena;
  private final MemorySegment mappedSegment;

  private long writeOffset;

  public RawLog(Path path, long fileSize) throws IOException {

    // STEP 1: Create thread-confined Arena.
    this.arena = Arena.ofConfined();

    // STEP 2: Pre-allocate the file size on disk.
    //         Make sure it has the required size by writing at the end.
    try (FileChannel fc = FileChannel.open(path, CREATE, READ, WRITE)) {
      fc.position(fileSize - 1);
      fc.write(ByteBuffer.wrap(new byte[]{0}));

      // STEP 3: Map the file into memory
      this.mappedSegment = fc.map(
        FileChannel.MapMode.READ_WRITE,
        0,
        fileSize,
        arena
      );
    }

    this.writeOffset = 0;
  }

  public void append(byte[] data) {
    // STEP 1: Defensive check: Do we have space?
    if (writeOffset + data.length > this.mappedSegment.byteSize()) {
      throw new IllegalStateException("Log is full.");
    }

    // STEP 2: Efficient Memory Copy.

    // Create a MemorySegment where the data will go
    MemorySegment dst = mappedSegment.asSlice(writeOffset, data.length);

    // Create a  MemorySegment of the data we can copy
    MemorySegment src = MemorySegment.ofArray(data);

    // Copy data from src into dst
    dst.copyFrom(src);

    // STEP 3: Update write offset
    writeOffset += data.length;
  }

  @Override
  public void close() {
    try {
      // Request the OS to flush dirty pages to stable storage
      this.mappedSegment.force();
    }
    finally {
      // Explicitly release the mapped memory and file handles.
      // If we forget to call close(), the Arena logic prevents leaks 
      // (though in a Confined scope, it expects deterministic closing).
      arena.close();
    }
  }
}

Repository: RawLog.java

This version is intentionally single-threaded and lacks record structure.


The First Failure

We have something that looks promising.

The RawLog class is clean and uses a modern and powerful new Java API. We can append data directly into a memory-mapped file. There’s no GC pressure, no object graph, no hidden lifecycle. Writes are fast, simple, and explicit.

Unfortunately, that’s not enough…

Let’s write a simple main method that appends two distinct messages to our log and then tries to read them:

java
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.charset.StandardCharsets;

public class Main {

public static void main(String[] args) throws Exception {
    Path logFile = Files.createTempFile("storage-", ".log");

    // Allocate a 1KB log file
    try (RawLog log = new RawLog(logFile, 1_024)) {
      // Append two separate messages
      log.append("Hello".getBytes(StandardCharsets.UTF_8));
      log.append("World".getBytes(StandardCharsets.UTF_8));
      System.out.println("Appended two messages.");
    }

    // The Arena is closed here. The OS buffers are flushed (eventually).

    // Verification: Read the raw file from disk
    byte[] fileContent = Files.readAllBytes(logFile);
    String contentAsString = new String(fileContent, StandardCharsets.UTF_8);
    // We trim() because the file is padded with null bytes to 1024
    System.out.println("Raw file content: '" + contentAsString.trim() + "'");
  }
}

Repository: Main.java

When we run this code, the output will be:

text
Appended two messages.
Raw file content: 'HelloWorld'

Our two distinct append calls have been smeared together into a single continuous stream of bytes. What we’ve built so far is a shapeless bag of bytes.

Not only can we not tell where ‘Hello’ ends, but we also have no way of knowing whether the bytes are correct or even fully written to disk.

This demonstrates our first critical flaw and makes our problems ahead more tangible.


The “Uh Oh” Moment: Analysis of Failure and What’s Ahead

Our RawLog is a major step forward, and we successfully wrote data to the disk using high-performance mapped memory! It’s a powerful foundation that actually stores bytes via FFM, but it’s not a storage system (yet).

At this point, we have something that looks impressive.

A fast append-only log.
Zero GC pressure.
Memory-mapped I/O with explicit lifetimes.
Clean, modern Java.

Unfortunately, it’s also dangerously naïve.

It’s missing three non-negotiable guarantees to be called an actual storage system:

  • The Framing Problem (No Structure):
    To be useful, we need a way to delineate records, a system for framing our data. Without knowing where one record ends the next one starts, we are simply storing gibberish.

  • The Trust Problem (No Integrity):
    Currently, corrupted data would be read without any complaint. This is silent corruption, the most dangerous failure mode in any data system. We cannot simply trust our storage medium, we must be able to verify that what we read is exactly what we wrote.

  • The Durability Problem (The Power Cord Test):
    When our append method returns, the data has been copied to the OS page cache. A durable system requires an explicit contract with the OS that says: “Do whatever it takes to get this data onto a non-volatile medium right now.” Without it, our “storage” engine is just a volatile, in-memory cache with a backup plan.

None of these are corner cases. They’re the default failure modes of real systems.

In other words: our storage engine is fast, elegant, and… unfit for reality.

In Part II, we’ll fix that properly.

We’ll design a self-describing on-disk format, add checksums to detect corruption, and make durability explicit using fsync.

Same raw performance, but with structure, integrity checks, and an explicit durability contract.


A Functional Approach to Java Cover Image
Interested in using functional concepts and techniques in your Java code?
Check out my book!
Available in English, Polish, and Korean.