GC-Free Indexing and Compaction (Part 3)

Off-Heap Hash Tables, VarHandles, and the Reality of Append-Only Systems

2026-01-16 · 24 min

After Part I gave us raw I/O performance and Part II gave us safety and durability, there are still two fundamental flaws currently present:

Reads are slow:
Every lookup requires scanning the entire segment. This means an O(n) operation, and that’s unacceptable.
Disk usage is unbounded:
Updates and deletes leave garbage bytes on the disk behind forever. So eventually, we run out of space.

In this final article, it’s time to tackle both issues.

We will construct a fast off-heap hash index using Panama’s VarHandle to solve the read problem, eliminate GC pressure from the hot path, and then use the index to build a Compactor as a garbage collector for our disk.

Indexing: Finding a Needle in a Gigabyte Haystack

To make reads fast, we need a shortcut to avoid full sequential reads: an index.

The obvious Java solution would be a HashMap<String, Long>, rebuilt on startup by scanning the segment. But remember our 5-million-object HashMap problem?

It’s a recipe for GC disaster, even if only an index is on-heap rather than the data itself. For a systems-level component like a storage engine index, the heap is just as hostile as it is for the data itself.

The solution: an off-heap linear-probed hash table.

Keeping it off-heap index means zero additional objects to exist for the GC to trace.

The Design: An Off-Heap Linear-Probed Hash Table

Like with StorageSegment, we allocate a contiguous MemorySegment off-heap to serve as our hash table.

By managing the memory layout ourselves, we make the entire index invisible to the GC, giving us the predictable performance we want.

The downside of managing the hash table’s memory ourselves, though, is that we need… well… to manage the hash table.

A hash table needs a strategy for handling collisions, when two different keys hash to the same slot. Java’s HashMap uses a method called “separate chaining,” in which each slot, or bucket, can hold multiple entries in a list or tree.

We use a simpler, more direct approach: open addressing with linear probing.

It’s easy to implement, very fast for mostly empty tables because it creates highly predictable memory access patterns (a “sequential scan” of a small memory region), which modern CPUs love.

Its primary weakness is clustering. As the table gets full, long chains of occupied slots form, degrading lookups. We mitigate clustering by enforcing a conservative load factor and resizing before the table becomes dense.

The Memory Layout: Fixed, Compact, Predictable

In a database, RAM is the most expensive resource. That’s why we use hashes of the keys we want to look up in the first place, and not the full keys themselves. So we must keep our index entry as lean as possible.

Thinking about the actual design, we definitely need the key and the data offset. In fact, the keys have variable size, so storing a hash makes more sense. This keeps the index compact and entry size predictable.

But we might also decide to add the data length to the index, so we can load the record right after the index lookup. However, remember that our data on disk is self-describing (the header contains the length).

Therefore, our index only needs to tell us where the data starts. This allows us to pack every index entry into CPU-cache-friendly 16 bytes:

+-------------------------------+-------------------------------+
|      Key Hash (8 bytes)       |      Record Offset (8 bytes)  |
+-------------------------------+-------------------------------+
|  (long) XXHash64 of the key.  |   (long) The absolute byte    |
|    0 denotes an empty slot.   |  offset in the segment file.  |
+-------------------------------+-------------------------------+

One downside of using hashes is that a production system would need to verify the full key on disk during read. For the article, we treat a hash match as an assumed match, to keep it simple.

Why Hash Quality Matters

In an off-heap index, the hash is not an implementation detail… it is the key.

Poor distribution causes clustering, which directly degrades lookup performance under linear probing. That makes hash quality a first-order concern:

Speed:
It must be incredibly fast. We call it on every put, get, and delete operation.
Distribution:
It must spread keys evenly across available buckets to avoid collisions. A good hash function exhibits a strong avalanche effect.

Java’s built-in String.hashCode() is decent for general use, but it was never designed for the high-performance, collision-resistant needs of a low-level hash table.

That’s where specialized algorithms like xxHash and MurmurHash shine. They’re orders of magnitude faster and provide better statistical quality—standard practice in high-performance systems from databases to game engines.

We’re going to use xxHash64 from the net.openhft:zero-allocation-hashing library. It’s battle-tested and gives us both speed and excellent hash distribution.

Implementing the Index: `VarHandle` in Action

VarHandle are precompiled, strongly typed accessors for structured memory layouts. Because of that, the JIT can often reduce these to minimal machine instructions with no method call overhead, which is excellent for hot paths.

The index implementation uses the same VarHandle for both the hash and the offset, as both are long:

java

public class OffHeapIndex implements AutoCloseable {

  private static final long EMPTY_HASH      = 0L;

  // Minimal record representing an index entry
  record IndexEntry(long keyHash, long recordOffset) {

    public boolean isEmpty() {
      return this.keyHash == EMPTY_HASH;
    }
  }

  // Slot Layout Constants
  // 16 bytes per slot: [ Hash(8B) | Offset(8B) ]
  private static final long BYTES_PER_SLOT   = 16L;
  private static final long HASH_OFFSET      = 0L;
  private static final long RECORD_OFFSET    = 8L;

  // VarHandle for optimized access to our slot structure.
  // One is enough: "a long at a byte offset"
  private static final VarHandle LONG_HANDLE =
      ValueLayout.JAVA_LONG_UNALIGNED.varHandle();

  // Memory Access
  private final Arena arena;
  private MemorySegment indexSegment;
  private final long capacity; // in slots

  // Hashing
  private static final LongHashFunction XX_HASH = LongHashFunction.xx();

  public OffHeapIndex(long capacity) {
    if (capacity <= 0) {
      throw new IllegalArgumentException("capacity must be > 0");
    }

    this.arena = Arena.ofConfined();
    this.capacity = capacity;

    // Allocate the off-heap memory.
    // It's automatically zero-filled.
    this.indexSegment = arena.allocate(capacity * BYTES_PER_SLOT);
  }

  @Override
  public void close() {
    // ...
  }
}

Repository: OffHeapIndex.java

The `put` Method: Finding a Home for a Key

This is where we implement the linear probing logic.

The core is a loop that scans for an empty slot. To prevent an infinite loop if the index is full, we bound the search to the table’s capacity. In a production system, we would resize/grow the table here.

Linear Probing Example (capacity = 8):

Index Table (each slot = 16 bytes: hash + offset):
+-----+-----+-----+-----+-----+-----+-----+-----+
|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |
+-----+-----+-----+-----+-----+-----+-----+-----+

STEP 1: Insert key "foo" -> hash % 8 = 3
+-----+-----+-----+-----+-----+-----+-----+-----+
|  0  |  1  |  2  | foo |  4  |  5  |  6  |  7  |
+-----+-----+-----+-----+-----+-----+-----+-----+

STEP 2: Insert key "bar" -> hash % 8 = 3 (collision!)
        Check slot 3: occupied
        Check slot 4: empty -> insert here
+-----+-----+-----+-----+-----+-----+-----+-----+
|  0  |  1  |  2  | foo | bar |  5  |  6  |  7  |
+-----+-----+-----+-----+-----+-----+-----+-----+
                     ^     ^
             initial |     |  proved
               slot  |     | to here

Now let’s implement this using VarHandle, the high-performance memory accessor I mentioned in the first article.

VarHandle are typed, JIT-friendly access to raw memory without the overhead of method calls, making it perfect for our hot-path index operations.

java

// Reserve 0 as EMPTY.
// If hash returns 0, map it to 1.
private long normalizeHash(long h) {
  return (h == EMPTY_HASH) ? 1L : h;
}

public void put(byte[] key, long recordOffset) {

  // STEP 1: Hash the key
  long keyHash = XX_HASH.hashBytes(key);
  keyHash = normalizeHash(keyHash);

  // STEP 2: Make sure we have a positive slot index
  long startSlot = Math.floorMod(keyHash, this.capacity);

  for (long i = 0; i < this.capacity; i++) {
    // STEP 3: Find slot candidate
    long slot = (startSlot + i) % this.capacity;
    long slotOffset = slot * BYTES_PER_SLOT;

    // STEP 4: Use VarHandle to read the hash in the current slot
    long currentHash = (long) LONG_HANDLE.get(this.indexSegment, slotOffset + HASH_OFFSET);
    
    // STEP 5: Check if the slot is empty
    if (currentHash == EMPTY_HASH) {
      // STEP 6: Use VarHandle to write index entry data
      LONG_HANDLE.set(this.indexSegment, slotOffset + HASH_OFFSET, keyHash);
      LONG_HANDLE.set(this.indexSegment, slotOffset + RECORD_OFFSET, recordOffset);
      return;
    }

    // STEP 7: Hash collision? Overwrite (models "latest wins" updates for the demo)
    if (currentHash == keyHash) {
      LONG_HANDLE.set(this.indexSegment, slotOffset + RECORD_OFFSET, recordOffset);
      return;
    }
  }
  
  // If the loop completes, the index is full.
  throw new IllegalStateException("Index is full!");
}

Repository: OffHeapIndex.java

Note the use of Math.floorMod() to map the hash to a slot index. While hashes can be negative, array indices cannot.

The `get` Method: The Search

The get method follows the exact same linear-probing path as put. This symmetry in logic is one of the elegant features of open addressing.

The search terminates if it finds a matching hash or hits an empty slot, which means the key cannot exist further down the probe chain in this table.

java

public Optional<IndexEntry> get(byte[] key) {

  // STEP 1: Hash the key
  long keyHash = XX_HASH.hashBytes(key);
  keyHash = normalizeHash(keyHash);

  // STEP 2: Find positive slot index
  long startSlot = Math.floorMod(keyHash, this.capacity);

  for (long i = 0; i < capacity; i++) {
    // STEP 3: Find slot candidate
    long slot = (startSlot + i) % capacity;
    long slotOffset = slot * BYTES_PER_SLOT;

    // STEP 4: Use VarHandle to read the hash in the current slot
    long currentHash = (long) LONG_HANDLE.get(this.indexSegment, slotOffset + HASH_OFFSET);

    // STEP 5: Check if slot is empty
    if (currentHash == EMPTY_HASH) {
      return Optional.empty(); // Key not found in index
    }

    // STEP 6: We found a candidate!
    if (currentHash == keyHash) {
      long recordOffset = (long) LONG_HANDLE.get(this.indexSegment, slotOffset + RECORD_OFFSET);

      // In a real system, we would go to the segment file at `recordOffset`
      // and compare the full key to resolve hash collisions.
      return Optional.of(new IndexEntry(keyHash, recordOffset));
    }
  }

  // Scanned the whole table, no match.
  return Optional.empty();
}

Repository: OffHeapIndex.java

The get logic makes a significant simplification: if hashA == hashB, then Key A == Key B. With 64-bit hashes, collisions are rare, but they do happen.

In a production-grade system, however, we must verify the match by reading the actual key bytes from disk and comparing them:

Find a candidate slot in the index by matching the hash.
Use the recordOffset to read the full key from the StorageSegment file.
Compare the full key from the file against the key being searched for.
If they are identical, we have a true match. If not, it was a collision, and we must continue the linear probe in the index.

We are omitting this step here to keep the series focused, but this is a non-negotiable requirement for correctness in a real-world system!

An (Almost) Complete `OffHeapIndex`

Our index is now almost complete.

It’s another powerful, self-contained data structure that shows we can build sophisticated, C-style data structures in safe, modern Java. The FFM API has enabled us to sidestep the GC entirely for our critical hot path.

Let’s wire it into our StorageSegment and create a complete key-value store.

Wiring It All Together: The `KeyValueStore`

The KeyValueStore acts as the coordinator: it accepts writes, sends them to the StorageSegment (Durability), and then updates the Index (Searchability).

+------------------------+
|     KeyValueStore      |
+------------------------+
|  +------------------+  |   Manages   +------------------+
|  |   OffHeapIndex   |<>------------->|  StorageSegment  |
|  |  (Fast Lookups)  |  |             |  (Durable Data)  |
|  +------------------+  |             +------------------+
+------------------------+

Reads consult the index, then fetch the record directly.

In code form, it looks like this:

java

public class KeyValueStore implements AutoCloseable {

  private StorageSegment segment;
  private OffHeapIndex   index;

  public KeyValueStore(Path segmentPath, long segmentSize, long indexCapacity) throws IOException {
    this.segment = new StorageSegment(segmentPath, segmentSize, 0);
    this.index = new OffHeapIndex(indexCapacity);
    // On startup, we would need to scan the segment to rebuild the index.
  }

  public void put(DataRecord dataRecord) {

    // STEP 1: Write to the log
    long offset = segment.append(dataRecord);

    // STEP 2: Update the index
    index.put(dataRecord.key(), offset);
  }

  public Optional<DataRecord> get(byte[] key) throws IOException {
    // STEP 1: Consult the off-heap index
    Optional<OffHeapIndex.IndexEntry> entry = index.get(key);

    if (entry.isEmpty()) {
      return Optional.empty();
    }

    // STEP 2: Retrieve from disk using the offset
    long offset = entry.get().recordOffset();
    DataRecord record = segment.read(offset);

    // (Optional but recommended: verify record.key() equals key here)
    return Optional.of(record);
  }

  @Override
  public void close() {
    this.index.close();
    this.segment.close(); 
  }
}

Repository: KeyValueStore.java

All parts fit together nicely!

Let’s write a simple main method to see our creation in action. The slow, painful scan is gone, replaced by an indexed lookup.

java

public static void main(String[] args) throws Exception {
  Path file = Files.createTempFile("kvstore-", ".segment");
  try (var store = new KeyValueStore(file, 4096, 100)) {
    store.put(DataRecord.of("Key1", "Value1"));
    store.put(DataRecord.of("Key2", "Value2"));
    store.put(DataRecord.of("Key3", "Value3"));

    Optional<DataRecord> val = store.get("Key2".getBytes());

    System.out.println("Lookup for 'Key2' found: " +
      val.map(dr -> new String(dr.value())).orElse("Not found!"));
  }
}

Repository: Main.java

Lookups are now O(1) average case, regardless of how many records exist. We have achieved fast reads and fast writes.

This index is intentionally minimal and makes a few simplifying assumptions to keep the focus on Panama + off-heap mechanics: It’s single-threaded (no atomic/volatile VarHandle operations), it treats a 64-bit hash match as an assumed key match (so we ignore rare hash collisions), it reserves 0 as the “empty slot” sentinel (so we normalize hashes to never be 0), and it uses linear probing with an overwrite-on-same-hash rule so updates don’t create duplicate entries in the probe chain. None of these choices are hard to evolve later, but being explicit keeps the demo code honest.

However, our KeyValueStore constructor is still quite naïve. When the application restarts, the OffHeapIndex is created empty. It has no knowledge of the records that already exist in the StorageSegment file.

A complete implementation would need a recovery process: on startup, it must scan the StorageSegment file from beginning to end and use the data to rebuild the OffHeapIndex in memory.

We will deliberately omit this complex recovery logic to stay focused and keep the article short-ish. But keep in mind that in a real system, this step is another non-negotiable part for ensuring consistency.

Infinite Growth: Reclaiming Space

We solved the speed problem, but we still have a space problem.

If we call put repeatedly with the same key, each record permanently exist on disk. Only the index knows the truth and the last one, making any other one unreachable garbage.

Segment File on Disk:
+-----------------+-----------------+-----------------+-----------------+
| Record (k1, v1) | Record (k2, v1) | Record (k1, v2) | Record (k2, v2) |
+-----------------+-----------------+-----------------+-----------------+
  ^                 ^                 ^                 ^
  |-- GARBAGE --|   |-- GARBAGE --|   |--LIVE DATA--|   |--LIVE DATA--|

Index in Memory:
- "k1" -> points to (k1, v2)
- "k2" -> points to (k2, v2)

More than half of our file is wasted, but our code has no way to reclaim it.

To actually reclaim this space, we need a garbage collector for our disk files. In database terms, this is the Compactor.

This design rests on a few core principles working together.

Segmentation, Immutability, and Global Addressing

As we can’t clean up a file we’re currently writing to, we need to stop writing to a single, infinitely growing log file. Instead, we’ll write to a sequence of fixed-size segments.

When a segment reaches its capacity, we “seal” it, making it immutable, and start writing to a new, active segment. This immutability is key to safety: we can read and process old segments without worrying they will change beneath us.

With multiple files, however, we need a new way to address data. The index no longer stores the offset in a single file, but global offset, which evolves the StorageSegment constructor.

The Mighty Tombstone

As we introduced in Article II, deletes in an append-only system work by appending tombstone markers (empty-value records).

When the compactor sees a tombstone, it knows to discard that key and all its previous values. The tombstone is a “delete” instruction for the garbage collector.

Migration

The core of compaction is a migration process.

Instead of naïvely scanning gigabytes of old segment files, we will use our existing OffHeapIndex as the map of live data. This approach only reads the data that is potentially live, reducing I/O compared to a full scan.

Because we treat hash matches as definitive, compaction correctness depends on collision-free behavior for this demo.

Implementing The Algorithm

Compaction might sound complex at first, but it’s beautifully simple if we trust our Index.

We don’t need to scan the old files looking for garbage, instead:

Iterate the Off-Heap Index
It only points to the latest, surviving version of every key.
Read the Data
Resolve the pointer to the old segment files.
Filter Tombstones
If the value is “empty” (a tombstone), the key is effectively deleted. Do not copy it.
Write
Append the survivors to a new, fresh StorageSegment.
Swap
Atomically switch to the new segment and discard the old files.

Now that we know all the parts needed to clean up after ourselves, it’s time for a big refactor.

The Big Refactor: Managing the Universe of StorageSegments

Transitioning from a single log file to a segmented system is our biggest architectural leap. Instead of handling a single StorageSegment, the KeyValueStore must act as a virtual memory manager, translating global offsets into specific file reads.

The New StorageSegments Structure

Our Store now manages a timeline of files.

The active segment is the tip of the spear that receives all writes.
The sealed segments are the immutable history and candidates for compaction.

We track a global write-offset, a monotonically increasing number representing the total bytes written since the dawn of the database. This allows us to assign a unique, logical address to every byte, even across file boundaries.

java

public class KeyValueStore implements AutoCloseable {

  // The Active Head: Accepts new writes
  private StorageSegment activeSegment;

  // The Immutable History: Ordered list of old segments
  // (Newest to Oldest for faster lookup of recent data)
  private final List<StorageSegment> sealedSegments = new ArrayList<>();

  // Where does the current active segment start in the global timeline?
  // Example: If we sealed 2 segments of 1000 bytes, this is 2000.
  private long activeSegmentBaseOffset = 0;

  private final long maxSegmentSize;
  private final Path dataDirectory;

  // ...
}

Repository: KeyValueStore.java

To make routing between segments work, each StorageSegment needs to know its own base offset in the global address space:

java

public class StorageSegment implements AutoCloseable {
  private final long baseOffset;

  public StorageSegment(Path path, long maxSize, long baseOffset) throws IOException {
    // existing setup...
    this.baseOffset = baseOffset;
  }

  public long getBaseOffset() {
    return this.baseOffset;
  }
}

Repository: StorageSegment.java

The append logic becomes responsible for enforcing our file size limits.

Before writing, we check if the current segment is full. If it is, we “roll” it: flush it, move it to the sealed list, and open a fresh file.

java

public long append(DataRecord record) throws IOException {
  // STEP 1: Check if we need to roll
  if (this.activeSegment.size() > maxSegmentSize - RecordCodec.HEADER_SIZE - dataRecord.value().length) {
    flushAndRoll();
  }

  // STEP 2: Append to the physical file (returns local file offset, e.g., 50).
  //         Return the GLOBAL address (e.g., Base 1000 + Local 50 = 1050).
  //         The Index will store "1050".
  long localOffset = this.activeSegment.append(record);
  return this.activeSegmentBaseOffset + localOffset;

  // STEP 3: Update the index
  index.put(dataRecord.key(), globalOffset);
}

private void flushAndRoll() throws IOException {
  // STEP 1: Force the old data to disk (durability)
  this.activeSegment.flush();

  // STEP 2: Move current segment to history
  this.sealedSegments.add(this.activeSegment);

  // STEP 3: Update the base pointer for the next file
  this.activeSegmentBaseOffset += this.activeSegment.size();

  // STEP 4: Create new file (e.g., "3.log")
  int nextId = this.sealedSegments.size();
  Path nextPath = this.dataDirectory.resolve(String.format("%d.log", nextId));
  this.activeSegment = new StorageSegment(nextPath, this.maxSegmentSize, this.activeSegmentBaseOffset);
}

Repository: StorageSegment.java

With writing done, next is the most critical piece of logic: reading.

For example, when we have two segment files:

File A (Base 0, Size 1000)
File B (Base 1000, Size 1000) -> Active

The index gives us a raw number: 1050. The store must route the request to the correct file.

java

public Optional<DataRecord> get(byte[] key) throws IOException {
  // STEP 1: Ask Index for global offset
  Optional<OffHeapIndex.IndexEntry> entry = index.get(key);
  if (entry.isEmpty()) {
    return Optional.empty();
  }

  long globalOffset = entry.get().recordOffset();

  // STEP 2: Find the record using a helper (we will reuse this helper in compaction)
  DataRecord record = readRecord(globalOffset);

  // STEP 3: Check if it is deleted
  if (record.value().length == 0) {
      return Optional.empty();
  }

  // STEP 4: We found the record
  return Optional.of(record);
}

// Helper: Routes a global offset to the correct physical segment
private DataRecord readRecord(long globalOffset) throws IOException {
  // Case A: Active StorageSegment?
  if (globalOffset >= this.activeSegmentBaseOffset) {
    long localOffset = globalOffset - this.activeSegmentBaseOffset;
    return this.activeSegment.read(localOffset);
  }

  // Case B: Sealed StorageSegment? (Iterate backwards: Newest -> Oldest)
  for (int i = this.sealedSegments.size() - 1; i >= 0; i--) {
    StorageSegment seg = this.sealedSegments.get(i);
    // (Assumes we tracked base offsets for sealed segments in a list/map)
    if (globalOffset >= seg.getBaseOffset()) {
        return seg.read(globalOffset - seg.getBaseOffset());
    }
  }

  throw new IOException("Corrupt Index: Offset points to nowhere");
}

Repository: KeyValueStore.java

Last thing needed before tackling compaction is tombstones.

Since we are Append-Only, we cannot “remove” a key. We can only “append a note” saying it is deleted.

java

private static final byte[] TOMBSTONE_VALUE = new byte[0];

public void delete(byte[] key) throws IOException {
  // STEP 1: Append a record with empty value
  DataRecord tombstone = new DataRecord(key, TOMBSTONE_VALUE);
  long offset = this.activeSegment.append(tombstone);

  // STEP 2: Update index to point to this tombstone
  //         When someone calls get(...), they read this record, see len=0, and treat as missing.
  this.index.put(key, this.activeSegmentBaseOffset + offset);
}

Repository: KeyValueStore.java

As with many other approaches in this series, using a zero-length DataRecord as a tombstone is a simplification. In a real system, this would prevent actual empty records. Tombstones also only don’t free index slots directly until compaction rebuilds the table.

Now for the main event.

The Compact Implementation

We need to iterate over our OffHeapIndex, which is a raw hash table.

First, let’s add the capability to scan itself.

java

// Functional interface for the callback
public interface IndexVisitor {
  void visit(long keyHash, long globalOffset);
}

public void forEach(IndexVisitor visitor) {
  // Scan every slot in the huge memory segment
  for (long i = 0; i < capacity; i++) {
    long slotOffset = i * BYTES_PER_SLOT;
    long hash = (long) LONG_HANDLE.get(this.indexSegment, slotOffset + HASH_OFFSET);

    // If hash is not 0 (EMPTY), it's a live entry
    if (hash != 0) {
      long offset = (long) LONG_HANDLE.get(this.indexSegment, slotOffset + RECORD_OFFSET);
      visitor.visit(hash, offset);
    }
  }
}

Repository: OffHeapIndex.java

With the visitor in hand, we can finally create the compact() method. It creates a fresh world and moves only the survivors into it.

The implementation follows a simple four-step flow:

Create a fresh segment and index
Iterate all live entries, filter tombstones, and write survivors
Flush everything to disk
Atomically swap the old world for the new

java

public void compact() throws IOException {
  System.out.println("Starting Compaction (Stop-The-World)...");

  // STEP 1: Create a FRESH world.
  //         A completely new segment file starting at global offset 0 (simplification)
  int compactedId = this.sealedSegments.size() + 1;
  Path compactedPath = dataDirectory.resolve(String.format("%d.log", compactedId));
  StorageSegment compactSegment = new StorageSegment(compactedPath, this.maxSegmentSize, 0);
  OffHeapIndex compactIndex = new OffHeapIndex(index.getCapacity());
  boolean swapped = false;

  try {
    // STEP 2: Migration Loop
    //         Iterate ONLY the live keys.
    //         Garbage is naturally ignored.
    index.forEach((keyHash, globalOffset) -> {
      try {
        // STEP 2A: Read the Record
        //          Uses our routing helper from earlier to find the data.
        //          Critically: we read the HEADER here to know the length.
        DataRecord record = readRecord(globalOffset); 

        // STEP 2B: Filter Tombstones (The Garbage Collection moment)
        //          If it's a tombstone, we do NOTHING. It vanishes.
        if (record.value().length == 0) {
          return; // Drop it. It's dead.
        }

        // STEP 2C: Write to new StorageSegment
        long newLocalOffset = compactSegment.append(record);

        // STEP 2D: Update New Index (Global Offset 0 + local)
        //           Since this is a fresh world, GlobalOffset 0 == LocalOffset 0
        compactIndex.put(record.key(), newLocalOffset);

      } catch (IOException e) {
        throw new UncheckedIOException(e);
      }
    });

    // STEP 3: Flush to disk
    compactSegment.flush();

    // STEP 4: Simplified Swap
    // Close everything
    this.index.close();
    this.activeSegment.close(); 
    this.sealedSegments.forEach(StorageSegment::close);

    // Delete old/empty files
    // ... Files.delete(...) logic here ...

    // Swap references
    this.index = compactIndex;
    this.activeSegment = compactSegment;
    this.activeSegmentBaseOffset = 0; // Reset to 0
    this.sealedSegments.clear();
    swapped = true;

    System.out.println("Compaction Complete. Disk is clean.");
  } finally {
    if (!swapped) {
      compactIndex.close();
      compactSegment.close();
    }
  }
}

Repository: KeyValueStore.java

This effectively rewrites history. If we updated a key 500 times, only the 501st version is copied. The other 500 versions—and their bytes on disk—are obliterated during compaction.

Testing Reality

To validate that compaction actually reclaims space, we can run a simple experiment: write the same key 50 times (creating 49 garbage records), trigger compaction, and measure the disk usage before and after.

Here’s a simple main method showing what compaction does and how space is saved.

java

// Main.java
public static void main(String[] args) throws Exception {
  // Assume KV store now manages its directory
  Path dbPath = Files.createTempDirectory("kv-compact-");

  try (KeyValueStore store = new KeyValueStore(dbPath, 1024, 100)) {
    // Overwrite a key many times to create garbage
    for (int i = 0; i < 50; i++) {
      String value = "value_" + i;
      store.put(DataRecord.of("mykey", value));
    }

    long sizeBefore = store.sizeOnDisk();
    System.out.printf("Size before compaction: %,d bytes%n", sizeBefore);

    store.compact(); // Manually trigger compaction

    long sizeAfter = store.sizeOnDisk();
    System.out.printf("Size after compaction:  %,d bytes%n", sizeAfter);

    // Verify data integrity
    Optional<DataRecord> maybeRecord = store.get("mykey".getBytes());
    maybeRecord.map(DataRecord::value).map(String::new).ifPresent(val -> System.out.println("Final value: " + val)
  }
}

Repository: Main.java

This reduction is a strong signal that compaction works for this simplified design—49 obsolete versions vanished, only the live data remains, and the final value is still intact.

Beyond unit tests: Does Our Storage Engine Actually Work?

We have built a system involving memory mapping, raw bytes, struct manipulation, checksums, hash collisions, and file management.

Throughout, we tested tiny parts of it, but not its robustness.

Systems software like our storage engine benefit from model-based testing: run thousands of randomized operations against both our engine and a trusted reference (a HashMap “oracle”), comparing results after each operation.

Periodically inject crashes by abruptly closing and reopening the store. If the states ever diverge, we have a bug.

This technique, sometimes called “chaos testing” or “property-based testing”, is how production storage systems gain confidence in their correctness. Libraries like jqwik or QuickTheories make this practical in Java.

Creating a full chaos harness could easily be an article in itself. As this series is already as long as it is, we keep this part theoretical and as a possible exercise for the reader.

What We Built (Sustainable, But Naïve)

Over these three articles, we went from an idea and a blank file to a legitimate storage engine. We’ve built something that would’ve been considered impractical in Java not long ago:

Off-heap data storage with explicit memory management
Predictable latency without GC interference on the hot-path
Off-heap indexing
Crash-resilient record format

All achieved in modern Java without any Unsafe, JNI, or wishful thinking.

Article 1 gave us Speed:
Utilizing the Panama FFM API to bypass the Heap and map files directly to memory.
Article 2 gave us Resilience:
Designing a binary format with CRCs to prevent torn writes and detect corruptions.
Article 3 gave us Utility:
Adding an Off-Heap Index for fast lookups and Compaction for sustainable disk usage.

Despite its naïve implementation, our store is now architecturally complete for what we set out to achieve. It handles the full data lifecycle: put, get, delete, and the cleanup/compaction to sustain itself.

That’s already quite a lot of functionality, but the goal of the series wasn’t to produce a feature-complete database, and we didn’t, as we lack a full database durability contract (commits, persisted write pointer, crash recovery procedure, and atomic file swaps). Instead, we hope we could demonstrate that Java no longer forces us to choose between safety and control.

With Panama, the language has crossed an important threshold, opening up systems programming to us. It won’t replace other systems languages anytime soon, but now we have a new tool set at our fingertips. Whether we use that power wisely, or recreate the same old footguns with nicer APIs… that’s now on us.

What We Didn’t Build

Our engine is architecturally complete for what we set out to achieve, but production systems need more. To give you the full picture, we should also talk about what we deliberately left out:

Crash Recovery
On restart, we’d need to scan the segment file, rebuild the index, and handle partially-written records (e.g., stop at the first CRC failure, truncate). Without this, restarts lose all indexed state.
Concurrent Access
Our confined arenas and single-threaded design are intentional simplifications. A real system needs concurrent shared arenas, atomic VarHandle operations (compareAndSet, getVolatile), and careful consideration of memory ordering.
Hash Collision Handling
We treat hash equality as key equality. A 64-bit collision is rare but not. Production code must read the actual key bytes from disk and compare them when the hash matches.
Manifest / Write-Ahead State
We have no persistent record of which segments exist or where the write pointer is. A manifest file (or header in segment 0) could track this, enabling safe recovery and atomic segment swaps during compaction.
Dynamic Resizing
Our index has a fixed capacity. When it fills up, we throw an exception. A real index would resize (allocate larger segment, rehash all entries, swap).
Atomicity Guarantees
Our compaction “swap” isn’t truly atomic. A crash mid-swap could leave inconsistent state. Production systems use techniques like write-ahead logging or two-phase commits.

Each of these is a natural extension of what we built. The patterns are in place and the engineering is straightforward (if sometimes tedious).

#java #storage-from-scratch

Support Me on Ko-fi

GC-Free Indexing and Compaction (Part 3)

Off-Heap Hash Tables, VarHandles, and the Reality of Append-Only Systems

Indexing: Finding a Needle in a Gigabyte Haystack

The Design: An Off-Heap Linear-Probed Hash Table

The Memory Layout: Fixed, Compact, Predictable

Why Hash Quality Matters

Implementing the Index: VarHandle in Action

The put Method: Finding a Home for a Key

The get Method: The Search

An (Almost) Complete OffHeapIndex

Wiring It All Together: The KeyValueStore

Infinite Growth: Reclaiming Space

Segmentation, Immutability, and Global Addressing

The Mighty Tombstone

Migration

Implementing The Algorithm

The Big Refactor: Managing the Universe of StorageSegments

The New StorageSegments Structure

The Compact Implementation

Testing Reality

Beyond unit tests: Does Our Storage Engine Actually Work?

What We Built (Sustainable, But Naïve)

What We Didn’t Build

Implementing the Index: `VarHandle` in Action

The `put` Method: Finding a Home for a Key

The `get` Method: The Search

An (Almost) Complete `OffHeapIndex`

Wiring It All Together: The `KeyValueStore`