What synchronized Actually Does

And When to Stop Using It

 · 34 min

Asked for a one-line mental model of what synchronized does, most have a simple answer: making sure only one thread at a time can access the synchronized block or method.

In most cases, this is enough to write synchronized code. It works, we ship it.

But that mental model is definitely not enough to debug it if something goes wrong, optimise it, or know when to reach for something else instead.


How synchronized Works

Java can synchronize two things:

  • Methods locked to this for instance methods, or to the Class object (e.g. synchronized (MyClass.class)) for static methods
  • Blocks locking on any object expression

At its core, synchronized is built around so-called monitors, and every object has one. A monitor is a mutual exclusion lock combined with a wait/notify mechanism (that particular API is a topic for another article in itself).

What that means is that when a thread accesses a synchronized resource, it acquires the monitor from the target object. All other threads trying to acquire the same monitor will block until the first one releases it.

The following code showing the two types of synchronized is effectively doing the same thing, as the actual operation, incrementing the count is synchronized in both cases on this.

First implicitly via the synchronized method, and then explicitly via the block-syntax:

java
public synchronized void increment() {
    this.count++;
}

public void increment() {
    synchronized (this) {
        this.count++;
    }
}

Regardless which variant we choose, we get two guarantees:

  • Mutual exclusion:
    Only one thread executes the method/block.

  • Visibility:
    Any changes inside a synchronized method/block are made visible across CPU caches, meaning the next thread to acquire the same monitor is seeing those changes. This is part of the “happens-before” guarantee of Java’s memory model.

Simple enough!

Does that mean Java solved concurrency issues with a single keyword?

No.

Even though “it just works”™ most of the time for the simple stuff, there are a variety of traps we can fall into.


A Field Guide To synchronized Pitfalls

Synchronizing on the wrong thing

Leaking the Lock

Synchronizing on this, as it’s always the case for methods, or easy to do with blocks, feels kind of right and the natural thing to do. The object knows best about its state and what it does, so it should lock on itself.

That assumption leads to code that looks “safe” at first glance:

java
public class SessionCache {

    private final Map<String, Object> cache = new HashMap<>();

    public synchronized void put(String key, Object value) {
        this.cache.put(key, value);
    }

    public synchronized Object get(String key) {
        return this.cache.get(key);
    }
}

For each SessionCache instance, only one thread can access or change the cache.

But there’s a problem: synchronizing this means exposing the monitor to the outside world.

Anyone having a reference to a SessionCache instance can lock on it, meddling in the internal affairs of put and get:

java
// Somewhere completely different...
SessionCache cache = getSharedCache();

// Trying to acquire the SAME monitor as cache uses internally
synchronized (cache) {  
    // long-running operation holding the lock
    var snapshot = buildSnapshot(cache);

    // blocks for network I/O while holding the lock
    sendToRemote(snapshot);  
}

Every other call to cache.put() or cache.get() from other threads is now blocked for the duration of that synchronized block somewhere completely unrelated.

That’s definitely not what we want.

The fix is simple though (if we don’t want to use something else than synchronized, as I will talk about later) is locking on another object that we fully own:

java
public class SessionCache {
    // Instance private object providing the monitor
    private final Object lock = new Object();

    private final Map<String, Object> cache = new HashMap<>();

    public void put(String key, Object value) {
        synchronized (this.lock) {
            this.cache.put(key, value);
        }
    }

    public Object get(String key) {
        synchronized (this.lock) {
            return this.cache.get(key);
        }
    }
}

The lock field is private and final, so no external code can acquire it, and the reference can’t be changed internally. We regained full control over who contends on the monitor.

Using this might feel natural, especially when using the synchronized keyword for methods. But we need to be aware that this isn’t just an internal detail, but the instance itself.

For our own code, that might still be sufficient, as we (hopefully) know what we’re doing. Designing for a library or framework, however, where we don’t control who calls the code, using dedicated lock objects ensures correctness.

Mutable lock references

Mutability always gets us in the long-run when using concurrency, and synchronized is no exception:

java
public class ConfigHolder {
    private Map<String, String> config = loadConfig();

    public String getProperty(String key) {
        synchronized (this.config) {
            return this.config.get(key);
        }
    }

    public void reload() {
        this.config = loadConfig();  // new object means new monitor
    }
}

After reload(), threads calling getProperty() synchronize on the new map while any thread that entered getProperty() before the swap still holds the monitor on the old map.

Two threads now believe they have exclusive access.

Neither does.

This is subtle because it works perfectly until reload() is called, and even then it only fails if a getProperty() call is in flight during the swap.

That’s why final on the lock field isn’t just style, it’s a correctness requirement the compiler won’t enforce for you.

IntelliJ will warn about this (Synchronization on a non-final field), but many codebases suppress or ignore the inspection.

Another benefit of making the lock final is preventing a NullPointerException of lazily initialised or injected lock, as synchronized(null) is not valid.

The interning trap

Imagine a multi-tenant system where each tenant gets their own lock:

java
public class TenantService {

    public void processOrder(Integer tenantId, Order order) {
        synchronized (tenantId) {
            // validate, persist, notify, do other stuff...
        }
    }
}

This looks like a per-tenant lock, as each tenantId gets its own lock, so different tenants won’t block each other.

In theory, that’s correct. But we used an Integer (uppercase I), meaning it’s a boxed type.

Boxed types are cached for certain ranges, e.g., -128 to 127 for Integer.

So calling processOrder(42, myOrderVar) will translate to tenantId to Integer.valueOf(42) and return a cached Integer instance.

That’s still a lock per tenant ID, so what’s the problem?

Any unrelated code that synchronizes on eventually Integer.valueOf(42), too, like a cache index, a retry counter, anything really, will get the same monitor. It’s like the previous issue with locking on this, but worse!

The same is true for interned String instances and literals, other autoboxed types, and any canonicalized reference.

When debugging, this looks baffling: two completely unrelated components deadlocked on java.lang.Integer@2a742aa2, and we’re sitting there trying to figure out what they could possibly have in common.

We only get the address of the lock because the monitor lives inside the object header. There is no per-call-site lock or instance visible, as the lock is the object.

The good news is that modern Java actively fights this footgun.

Since JDK 16 (JEP 390), Integer, Double, and other wrapper types are officially designated as “Value-Based Classes.” Because their identity isn’t meant to matter, locking on them is conceptually wrong.

If you compile that TenantService code today, javac will immediately yell at you with a synchronization warning:

TenantService.java:4: warning: [synchronization] attempt to synchronize on an instance of a value-based class
        synchronized (tenantId) {
        ^
1 warning

The JVM gives us a dedicated flag to hunt down these traps at runtime. Setting -XX:DiagnoseSyncOnValueBasedClasses=2 will log a warning (and emit a JFR event) every time a thread attempts to lock a value-based class. Setting it to 1 upgrades the warning to a fatal error, throwing an IllegalMonitorStateException instead.

Lock Ordering and Deadlocks

The production deadlock (Example)

Imagine three components in a framework with lazy initialisation.

The first component is a registry for lazily-instantiated services:

java
public class ServiceRegistry {
    private final Map<String, Object> services = new HashMap<>();

    public synchronized Object getService(String name) {
        return services.computeIfAbsent(name, this::initializeService);
    }

    private Object initializeService(String name) {
        // initialisation may trigger class transformation
        Class<?> clazz = ClassTransformer.getInstance().transform(name);
        return instantiate(clazz);
    }
}

The second component transforms classes and caches the result:

java
public class ClassTransformer {
    private static final ClassTransformer INSTANCE = new ClassTransformer();
    private final Map<String, Class<?>> cache = new HashMap<>();

    public synchronized Class<?> transform(String className) {
        return cache.computeIfAbsent(className, this::doTransform);
    }

    private Class<?> doTransform(String className) {
        // transformation needs config values for the class
        String setting = ConfigManager.getInstance().getProperty(className + ".strategy");
        // ... bytecode transformation using setting ...
        return transformedClass;
    }

    public static ClassTransformer getInstance() { return INSTANCE; }
}

And the last one holds the application config with live-reload support:

java

public class ConfigManager {
    private static final ConfigManager INSTANCE = new ConfigManager();

    private Map<String, String> properties = new HashMap<>();
    private final List<Runnable> listeners = new ArrayList<>();

    public synchronized String getProperty(String key) {
        return properties.get(key);
    }

    public synchronized void reload() {
        this.properties = loadFromDisk();
        // notify dependents that config changed
        for (Runnable listener : listeners) {
            listener.run();  // listener might call back into ServiceRegistry
        }
    }

    public static ConfigManager getInstance() { return INSTANCE; }
}

Each one was developed independently, tested in isolation and verified that they are by themselves thread-safe. But put together, they form a deadlock cycle that survives years of testing before sporadically surfacing in production.

Normal flow works fine:

text
request
  |
  +--> ServiceRegistry.getService()
         |
         +--> service already cached
                |
                +---> returns immediately

No contention, no cross-component calls.

Under load, however, the flow might turn deadly:

Thread A: ServiceRegistry.lock          -> initializeService()
          -> ClassTransformer.lock      -> doTransform()
          -> needs ConfigManager.lock

Thread B: ConfigManager.lock            -> reload()
          -> listener.run()             -> calls ServiceRegistry.getService()
          -> needs ServiceRegistry.lock         <== BLOCKED

Thread A: -> waiting for ConfigManager.lock     <== BLOCKED

We got a deadlock!

Thread A holds Registry, and needs Config.
Thread B holds Config, and needs Registry.

And that’s with just two threads involved. Add a third thread and it gets even more complicated to follow.

ClassTransformer might be in the middle of a different transformation when the ConfigManager.reload() call tries to notify it:

Thread A: ServiceRegistry.lock  -> needs ClassTransformer.lock
Thread B: ClassTransformer.lock -> needs ConfigManager.lock
Thread C: ConfigManager.lock    -> needs ServiceRegistry.lock

Now we have a circular wait: A -> B -> C -> A

Why didn’t testing catch this problem?

Because it only becomes a problem when initialisation, class loading, and the config reload overlap in time. In tests, everything is initialised sequentially, and reload() isn’t called concurrently with first access.

In production, under actual load, the three components working together over a longer time create the opportunity of the small time-window to deadlock. It works 999 times out of a 1000, but then, it will lock up at 3 AM when you’re on call…

What this example shows is that none of the synchronized blocks is the actual root cause, but the coarse locking makes it worse.

Each component works as intended. What’s missing is a global lock ordering contract across them, ensuring locking follows a deterministic pattern.

When this finally surfaces in production, the first tool to reach for is jcmd <pid> Thread.print. or jstack -l <pid> for older JDKs. It will print a “Found one Java-level deadlock:” section that names every thread involved and the monitor each one is waiting for:

Found one Java-level deadlock:
=============================
"thread-A":
  waiting to lock monitor 0x00007f... (object 0x..., a ConfigManager),
  which is held by "thread-B"
"thread-B":
  waiting to lock monitor 0x00007f... (object 0x..., a ServiceRegistry),
  which is held by "thread-A"

That output maps directly onto the lock cycle, without any guessing which components are involved.

Why synchronized methods make this worse

Every component method is synchronized on the entire method body. But ServiceRegistry.getService() only needs the lock for the map lookup, as the ClassTransformer.transform() call doesn’t touch services and shouldn’t hold the registry lock:

java
public Object getService(String name) {

    // Acquire lock only for the work that needs to be thread-safe
    synchronized (this) {
        Object existing = services.get(name);
        if (existing != null) {
            return existing;
        }
    }

    // Lock is released here, it's safe to call out
    Object newService = initializeService(name);

    // Back to thread-safe work
    synchronized (this) {
        // double-check after reacquiring
        return services.computeIfAbsent(name, k -> newService);
    }
}

This approach doesn’t eliminate the need for a lock ordering strategy, but it dramatically shrinks the window in which ordering violations can occur.

The rule: never call out to another component while holding your own lock, if you can avoid it.

Joshua Bloch famously coined the term “alien method” for this Effective Java.

Lock Ordering Contracts

The first part lock ordering contract means defining a deterministic order over all locks in a system. That doesn’t mean the whole system, but a series of components working together, creating a system, like ConfigManager/ClassTransformer/ServiceRegistry.

The second part is enforcing the contract, so every thread acquires locks only in that order. If we hold ClassTransformer’s lock, we may acquire ServiceRegistry’s, but never ConfigManager’s. Circular waits become structurally impossible.

Sounds great, but in practice, this is hard to enforce, especially with synchronized.

Lock contracts often live only in documentation and code review discipline. The compiler and the JVM won’t stop us from violating it.

In a small, tightly-owned codebase it’s usually manageable, especially after getting a call that production is down at 3 AM.

In frameworks where components are developed independently or contributed by different people, it tends to erode over time.

That’s why a different approach is needed: java.util.concurrent.locks.ReentrantLock

Instead of preventing ordering violations, we detect them at runtime by bailing out when a lock isn’t immediately available, avoiding the deadlock even without a global ordering.

More on that in the alternatives section below.

Invisible performance cliffs

The Safety Bottleneck

We want to use concurrency to get the best performance, why else did we get that nice 32-core CPU? The only problem is we need to use synchronized to make it safe in certain situations, like processing shop orders:

java
public class OrderProcessor {
    private final List<Order> ledger = new ArrayList<>();

    public synchronized Receipt process(Order order) {
        validate(order);                    //  1ms: CPU work
        Receipt receipt = persist(order);   //  4ms:  database round-trip
        this.ledger.add(order);             // <1ms
        return receipt;
    }
}

The synchronized method holds the monitor for ~5ms per call, including the database round-trip that doesn’t touch shared state.

Synchronizing that method creates a natural bottleneck, as our concurrent application can only use it sequentially. Let’s get our calculators out and check what that actually means:

Lock hold time:                 ~5ms per request
Max synchronized throughput:    1000ms / 5ms = 200 requests/sec

So we could have 32 threads in the pool, and still, only a single one doing actual work! The other 31 threads are parked, waiting, consuming memory and contributing to GC pressure.

Our 32-thread server is degraded to a single-threaded server with 31 spectators, thanks to thread-safety.

As with the other issues mentioned before, the simplest fix is using synchronized more like a scalpel, not an axe:

java
public Receipt process(Order order) {
    validate(order);                        // no lock needed
    Receipt receipt = persist(order);       // no lock needed

    // Lock working on the shared state
    synchronized (lock) {
        this.ledger.add(order);             // <1ms under lock
    }

    return receipt;
}

The lock hold time drops from 5ms to under 1ms. We still have a bottleneck, but reduced its length to only what’s necessary.

The danger of the original code was that it degrades gradually. Under light load, 5ms hold time is invisible. There’s rarely contention, and response times look fine. As traffic grows, threads start queuing behind the lock, and p99 latency starts to creep up.

The thread pool needs to grow to compensate. GC pauses lengthen from the extra live, parked threads. But no exception is thrown and no deadlock is detected. Monitoring shows “slow database” because that’s where threads are spending wall-clock time, masking that the real bottleneck is lock contention around the database call.

This is especially insidious when the I/O buried inside the synchronized block has variable latency. A database call that’s 4ms at p50 might be 80ms at p99. Under that tail latency, every other thread backs up behind the one holding the lock through a slow query:

  • Thread 1 hits a slow query and holds the lock for 80ms
  • Threads 2–32 are all parked, waiting for Thread 1’s monitor
  • Effective throughput during that 80ms is reduced to 12.5 req/sec instead of 200

Never hold a lock across I/O, network calls, or anything with unpredictable latency.

If we find ourselves needing to, that’s actually a design signal that our locking strategy might need rethinking, and not that the lock scope needs widening.

Convoy effect

Even when individual lock hold times are short, heavy contention creates another subtle problem.

When a thread releases a contended monitor, the OS scheduler must wake one of the parked waiters. That wake-up isn’t free, as it involves a context switch, cache line invalidation, and scheduler overhead.

Then, the newly woken thread acquires the lock, does its work, releases, and the cycle repeats.

Under high contention, this cycle degenerates into a pattern where threads take turns one at a time, each paying the full cost of a context switch.

The lock is technically short-held. But the overhead around each acquisition dominates:

Thread A: [ work ] ->  [ release ] -> [ context switch to Thread B ]
Thread B: [ work ] ->  [ release ] -> [ context switch to Thread C ]
Thread C: [ work ] ->  [ release ] -> [ context switch to Thread A ]
... for every thread ...

More contenders around the same locked resource means more context switches per second, and each switch flushes the CPU cache, meaning the work inside the critical section also gets slower.

And the worst thing is that it’s invisible in application code and in most monitoring. We won’t see it in thread dumps (no deadlock), and profilers will attribute time to the work inside the lock, not the scheduling overhead around it.

Latency metrics just show “everything got slower.”

async-profiler in lock profiling mode or JFR’s jdk.JavaMonitorEnter events with contention thresholds are the tools that make convoys visible.

If we see a monitor with thousands of contention events per second but sub-millisecond hold times, we have a convoy.

Monitor inflation

As mentioned before, monitors are lightweight, stack-based locks that cost almost nothing themselves.

Under contention, however, the JVM inflates these lightweight locks into their heavyweight alternative: ObjectMonitor, a C++ struct in the JVM that includes a mutex, a condition variable, a wait set, and an entry queue.

A single ObjectMonitor still is quite small. In pathological designs, think of thousands of contended objects like a synchronized block per cache entry, thousands of inflated monitors add measurable memory overhead and GC pressure beyond the contention cost itself.

Each ObjectMonitor is (depending on JVM/platform) around 140-200 bytes, living in native memory, not the Java heap. They increase the amount of physical memory the process needs, and it’s not managed by the GC, meaning that heap metrics won’t show it to us.

It still affects the GC indirectly, though. The JVM must scan and potentially deflate monitors during safepoints. A large number of inflated monitors increases “time-to-safepoint” metric that affects every thread in the application.

JDK 15 shipped monitor deflation off safepoints (JDK-8153224) and 21 improved it further JDK-8305994. But on older JDKs this can be a source of mysterious GC-adjacent pauses that don’t show up in GC logs.

If JFR shows a high count of jdk.JavaMonitorInflate events, it’s signalling that our locking granularity might be too fine. Many objects contended briefly rather than one object contended heavily.

The fix is often to coarsen the locking (one lock for the whole data structure instead of per-entry) or switch to a ConcurrentHashMap that manages its own striped locking internally.


Alternatives to synchronized

Now that we looked at all the subtle problems of synchronized, what alternatives are there?

ReentrantLock: synchronized with knobs

The java.util.concurrent.locks.ReentrantLock is a tool providing the same thing as synchronized does: mutual exclusion.

Unlike the keyword, though, it gives us control over it, not just what the lock should be. But with greater power also comes greater responsibility:

java
private final ReentrantLock lock = new ReentrantLock();

public void process(Order order) {

    // ACQUIRE: unlike synchronized, nothing prevents us from forgetting this 
    this.lock.lock();


    try {
        this.ledger.add(order);
    }
    finally {
        // RELEASE: should be done in a finally block, or an exception
        //          leaves the lock permanently held
        this.lock.unlock();
    }
}

The try/finally ceremony is the price of admission, and honestly, it’s non-negotiable. Even if we think the locked code won’t throw an exception, that might not be true in the future, or to begin with. Forgetting to unlock in a finally might leave the lock permanently held.

synchronized handles this implicitly for us via the compiler-inserted monitorexit (more on that later).

This isn’t just a minor ergonomic difference, it’s a real source of bugs, especially in evolving code over time. Someone adds an early return, someone wraps a section in a new try/catch, and the finally block is suddenly in the wrong place…

So why pay that price?

Because we gain 3 new capabilities!

Deadlock Avoidance instead of Deadlock Prevention

Imagine a banking system to transfer money between two accounts. Such an operation definitely requires mutual exclusion.

java
public void transfer(Account from, Account to, BigDecimal amount) {
    synchronized (from) {
        synchronized (to) {
            from.debit(amount);
            to.credit(amount);
        }
    }
}

When using synchronized, if two threads try to transfer money between the same two accounts in opposite directions and the second thread acquires the from monitor (this is to in the first thread), we got ourselves a deadlock.

As ReentrantLock.lock() blocks unconditionally just like synchronized, we need a better solution: tryLock()

java
public void transfer(Account from,
                     Account to,
                     BigDecimal amount) throws InterruptedException,
                                               TransferException {

    // tryLock(timeout) returns false immediately if the lock is unavailable
    if (from.lock.tryLock(100, TimeUnit.MILLISECONDS)) {
        try {
            if (to.lock.tryLock(100, TimeUnit.MILLISECONDS)) {
                try {
                    from.debit(amount);
                    to.credit(amount);
                    return;         // SUCCESS: both locks held, transfer complete
                }
                finally {
                    // RELEASE
                    to.lock.unlock();
                }
            }
        }
        finally {
            from.lock.unlock();     // RELEASE first lock whether or not we got the second
        }
    }

    // If we reach here, at least one lock wasn't available within the timeout.
    // Unlike a deadlock, we can handle this as the system is still live.
    throw new TransferException("Transfer could not be completed, please try again");
}

The tryLock(100, TimeUnit.MILLISECONDS) call adds a waiting mechanism: if the lock isn’t available within that window, give up and report failure rather than hanging indefinitely. For request-scoped work, returning a 503 after 100ms is vastly preferable to a thread that hangs forever.

There’s also an argument-less tryLock() that doesn’t wait at all, but in practice we usually want a timeout. The pure, non-blocking tryLock() would require handmade retry loops which can be their own kind of hazard, like livelocking under heavy contention.

Livelocks happen when two or more threads keep responding to each other’s actions without making any actual progress. Like two people on the sidewalk stepping aside in the same direction over and over to let the other pass. Unlike a deadlock, the threads aren’t blocked. Instead they are simply busy doing the wrong thing, like trying to acquire a lock and not getting it.

Cancellation Support

When using synchronized, a thread waiting to enter a monitor cannot be interrupted. A Thread.interrupt() call sets the flag but the thread stays parked until the monitor is released.

ReentrantLock.lockInterruptibly() respects interruption by throwing an InterruptedException if it receives an interrupt while waiting:

java
public void process(Order order) throws InterruptedException {
    // If interrupted while waiting for the lock, throws InterruptedException
    // and the lock is NOT acquired — so the finally block must NOT run.
    // Placing the call before the try block achieves exactly that.
    this.   lock.lockInterruptibly();

    try {
        this.ledger.add(order);
    }
    finally {
        this.lock.unlock();
    }
}

That’s critical for clean shutdown, so threads don’t hang waiting for a monitor that may never release.

Fairness Support

By default, ReentrantLock makes no guarantee about which waiting thread acquires the lock next. A newly arriving thread can be a bully, barging ahead of threads that have been waiting longer.

Enabling fairness changes that:

java
// Fair lock: longest-waiting thread always acquires next
private final ReentrantLock lock = new ReentrantLock(true);  // true = fair

A fair lock guarantees FIFO ordering, which eliminates a starvation problem, where a thread reacquires the same lock repeatedly while others waited indefinitely. But nothing comes for free.

The tradeoff is throughput.

Fair locks are measurably slower under contention because barging is precisely what makes unfair locks fast.

In most applications the default is fine. Fair locks are mostly for the cases where you need bounded worst-case latency per thread.

When ReentrantLock is just Ceremony

If we don’t need tryLock(), interruptibility, or fairness, then synchronized is simpler and equally fast.

The JVM applies the same lock optimisation tiers (thin -> inflated) to both. Don’t reach for ReentrantLock just because it feels more professional.

The try/finally burden is real, and synchronized is harder to get wrong.

ReadWriteLock: the Read-Heavy Optimisation

Many shared data structures are read far more often than written. synchronized and ReentrantLock make every reader wait for every other reader, which can be needlessly conservative.

The ReadWriteLock, as it says on the label, supports both locked reading and writing. Multiple threads can hold the read lock simultaneously. If a write lock is acquired, it’s exclusive and waits for all readers to be finished and blocks new ones until the write is done. That way, all readers saw the data before the write, and no new ones read it before it got changed.

java
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Map<String, Config> configs = new HashMap<>();

// Multiple threads can hold the read lock simultaneously.
// The locking makes sure that it won't be read during a write.
public Config getConfig(String key) {
    this.rwLock.readLock().lock();

    try {
        return this.configs.get(key);
    }
    finally {
        this.rwLock.readLock().unlock();
    }
}

// Write lock is exclusive: waits for all active readers to drain,
// then blocks both new readers and writers until the write completes.
public void updateConfig(String key, Config value) {
    this.rwLock.writeLock().lock();

    try {
        this.configs.put(key, value);
    }
    finally {
        this.rwLock.writeLock().unlock();
    }
}

For a config store read 1,000 times per second and updated once a minute, this is a massive win over synchronized.

There’s a caveat, of course.

If reads and writes are roughly equal in frequency, ReadWriteLock is slower than a plain ReentrantLock because of the overhead of tracking multiple concurrent readers. It earns its keep only when the read/write ratio is strongly skewed towards read.

Another approach for read-heavy code would be the StampedLock. It takes it further with optimistic reads by adding tryOptimisticRead(), which doesn’t acquire a lock in the first place. Afterwards, it checks whether a write happened in the meantime.

The fast path is zero-contention, but the API is genuinely treacherous and it’s not reentrant.

java
private final StampedLock lock = new StampedLock();

private double x, y;

public double distanceFromOrigin() {
    // tryOptimisticRead() doesn't acquire the lock — it just returns a stamp
    // representing the current lock state.
    // Cost is near zero.
    long stamp = lock.tryOptimisticRead();

    double currentX = this.x;
    double currentY = this.y;

    // VALIDATE: Has a write lock been acquired since we took the stamp?
    //           If we forget this check, we silently operate on potentially torn data.
    if (!this.lock.validate(stamp)) {
        // If yes, our reads of x and y may be inconsistent.
        // A writer could have changed x but not yet y when we read them.
        // We fall back to a real read lock.
        stamp = this.lock.readLock();
        try {
            currentX = this.x;
            currentY = this.y;
        }
        finally {
            this.lock.unlockRead(stamp);
        }
    }

    return Math.sqrt(currentX * currentX + currentY * currentY);
}

The optimistic path is extremely fast when writes are rare: no lock acquisition, no CAS, just a stamp check.

But forgetting validate() means our reads may reflect a half-written state: a writer updated x but not yet y when we read both, and we never noticed.

There’s no exception, no warning… just subtly wrong results.

For most scenarios, ReadWriteLock is the safer choice unless profiling shows it’s a genuine bottleneck.

No Explicit Locks At All

Manual lock handling provides a lot of freedom, but also a lot of possible headaches down the road. Sometimes, the best lock is the one we don’t have to manage ourselves.

Concurrent Collections

For shared key-value access, ConcurrentHashMap eliminates explicit locking entirely:

java
private final ConcurrentHashMap<String, AtomicLong> counters = new ConcurrentHashMap<>();

public void increment(String key) {
    // No explicit locking required 
    counters.computeIfAbsent(key, k -> new AtomicLong()).incrementAndGet();
}

It handles synchronization internally using per-bucket CAS (compare-and-swap) operations in modern JDKs. Before Java 8, it used a fixed array of ReentrantLock-guarded segments (16 by default); Java 8 replaced that with per-bucket CAS and synchronized on individual bucket heads for tighter granularity. We get concurrency without ever writing our own synchronized or lock() calls.

The compute, merge, and computeIfAbsent methods are atomic per-key, which covers most read-modify-write patterns.

Atomics

For single-variable updates, atomics use CAS at the CPU instruction level: no monitor, no thread parking, no context switches, no convoy effect. On x86, AtomicInteger.incrementAndGet() typically compiles down to a single lock xadd instruction — one CPU bus cycle, no kernel involvement.

java
private final AtomicLong counter = new AtomicLong();

public void increment() {
    this.counter.incrementAndGet();
}

For single-variable updates, atomics are strictly better than locking.

LongAdder goes further for high-contention counters by striping across multiple cells, trading a slightly more expensive sum() for dramatically cheaper increments.

volatile fields

The volatile keyword is the lightest tool in the box. Across threads, it guarantees visibility (every read sees the latest write) and ordering (writes before a volatile store are visible to threads that read the volatile):

java
// Visibility: every read sees the most recent write across threads.
private volatile boolean shutdownRequested = false;

public void shutdown() {
    // write is immediately visible to all threads
    this.shutdownRequested = true;
}  

public void processLoop() {
    // guaranteed to see the latest value: no stale cache
    while (!this.shutdownRequested) {  
        // do work
    }
}

What it does not provide is atomicity for compound operations!

count++ on a volatile count is still a race, as it’s actually three separate steps under the hood to increment a variable.

Use it for flags and published references, not for counters or state machines.

What to Choose When?

With all these options available, the choice can feel overwhelming. Thankfully, most scenarios map cleanly to one tool:

| Scenario                                    | Reach for
| ------------------------------------------- | ------------------------------------------------
| Short critical section, low contention      | synchronized: simple, the JVM optimises it well
| Need timeout, interruptibility, or try-lock | ReentrantLock
| Read-heavy, infrequent writes               | ReadWriteLock
| Single counter or flag                      | AtomicLong, AtomicBoolean, volatile
| High-contention counter                     | LongAdder
| Concurrent key-value access                 | ConcurrentHashMap with compute/merge
| Multiple locks, no clear ordering           | tryLock() with back-off

The first row is still the most important one: synchronized should be the default, not the fallback. Every alternative here exists because synchronized didn’t fit a specific use case, not because it’s inherently inferior.


Virtual Threads Changed the Game

Project Loom’s virtual threads promise to make thread-per-request servers viable at massive scale. Since JDK 21, we no longer need a fixed pool of expensive platform threads. Instead, we can spin up millions of cheap virtual threads without worrying about memory or scheduling overhead.

But synchronized was designed with platform threads in mind, and interacts with virtual threads in a way that quietly undermines that promise, at least until Java 24.

Carrier Thread Pinning

Virtual threads are scheduled onto a small pool of OS-level carrier threads, typically one per CPU core. When a virtual thread blocks on I/O or a lock, the JVM is supposed to unmount it from its carrier, park it cheaply, and let the carrier pick up another virtual thread. This is what makes the model work: a carrier thread stays busy doing useful work while thousands of virtual threads wait.

synchronized breaks this.

When a virtual thread enters a synchronized block, it gets pinned to its carrier thread. That means the JVM can no longer unmount it, even if it blocks inside the critical section. The carrier thread is stuck waiting alongside the virtual thread.

Carrier thread 1: [virtual thread A] <- blocked in synchronized, waiting for I/O
                   ^
                   |
                pinned (carrier cannot pick up other work)

Carrier thread 2: [virtual thread B] <- doing useful work
Carrier thread 3: [virtual thread C] <- doing useful work
Carrier thread 4: [virtual thread D] <- doing useful work

Virtual threads E, F, G, H...: waiting to be scheduled, but Carrier 1 is pinned

With a pool of, say, 8 carrier threads, a single pinned synchronized block occupying one carrier reduces your effective parallelism by 12.5%. If multiple threads are blocked in synchronized simultaneously, the degradation multiplies.

And the worst part is that pinning doesn’t fail in a visible manner. There’s no exception or even a warning in logs by default. Our application just gets slower under load in a way that looks like CPU saturation or an I/O bottleneck, because the carrier threads appear busy.

The actual throughput gain we expected from virtual threads simply doesn’t materialise.

Pinning is also contagious in layered code.

A synchronized block deep inside a library we don’t even control, let’s say a JDBC driver or a third-party cache, will pin our virtual threads just as effectively as one we wrote ourselves. We can audit and fix our own code and still be bitten by a dependency.

As this is a real issue, we can make pinning visible by setting the JVM flag -Djdk.tracePinnedThreads=full. It emits a stack trace whenever a virtual thread is pinned, letting you identify both your own code and offending library code:

Thread[#26,ForkJoinPool-1-worker-1,5,CarrierThreads]
    com.example.OrderProcessor.process(OrderProcessor.java:18) <== monitors:1

But don’t forget to remove it in production! Tracing threads isn’t free.

Non-Pinning ReentrantLock

ReentrantLock was built on LockSupport.park(), which the virtual thread scheduler understands.

When a virtual thread blocks on a ReentrantLock, it is cleanly unmounted from its carrier, which is free to pick up another virtual thread immediately:

java
// BEFORE: pins the carrier thread while waiting
public synchronized Receipt process(Order order) {
    // I/O inside synchronized: carrier is stuck for the duration
    return persist(order); 
}

// AFTER: carrier is free to schedule other virtual threads while waiting
private final ReentrantLock lock = new ReentrantLock();

public Receipt process(Order order) {
    this.lock.lock();

    try {
         // virtual thread parks cleanly, carrier moves on
         return persist(order);
    }
    finally {
        this.lock.unlock();
    }
}

This is why migrating hot synchronized blocks to ReentrantLock is the practical recommendation for any codebase adopting virtual threads.

JEP 491 and the Road Ahead

JDK 24 introduced JEP 491, which eliminates pinning for synchronized entirely. Virtual threads can now unmount from their carrier even inside a synchronized block.

If you’re on JDK 24+, this concern largely goes away for new code.

The catch is that JDK 24 is definitely not yet the production standard for most organizations. On prior versions, pinning is real and synchronized in hot paths is a genuine problem for virtual thread throughput.

My practical recommendation is that if your codebase is adopting virtual threads on JDK 21, audit synchronized usage in any code that may block:

  • I/O
  • Database calls
  • Network operations
  • Slow computations

Those are the pinning hotspots worth migrating to ReentrantLock.

Short, CPU-only critical sections are less likely to cause visible degradation since the carrier isn’t blocked for long even when pinned.


Under the Hood: What the JVM Actually Does

The pitfalls described earlier aren’t arbitrary. Instead, they are a direct consequence of how synchronized is implemented. That’s why it’s time to take a closer look at what’s actually happening.

Object headers and the mark word

Every Java object has a header invisible to application code. There are two things stored there: a pointer to class metadata, and the one we’re interested in, the mark word.

The mark word is a 64 bit slot the JVM repurposes for different things depending on context:

+----------------- 64 bits ----------------+
| Identity hashcode (31 bits)              |  normal state
| GC age (4 bits) | lock state (2 bits)    |
+------------------------------------------+

+------------------------------------------+
| Biased thread ID (54 bits) | epoch | age |  biased locked (deprecated JDK 15, removed JDK 18)
+------------------------------------------+

+------------------------------------------+
| Pointer to lock record on thread stack   |  thin locked
+------------------------------------------+

+------------------------------------------+
| Pointer to inflated ObjectMonitor        |  heavyweight locked
+------------------------------------------+

+------------------------------------------+
| (empty)                                  |  marked for GC
+------------------------------------------+

This is why the lock is the object: there’s no separate lock table. Locking an object modifies its header in place.

That directly explains the interning trap from using Integer.valueOf(42) as a lock. One specific object (from Integer cached values) only has one specific mark word. Any code synchronizing on it will use the same monitor, because the monitor belongs to the object, not the call site.

Java 6 added biased locking so a lock acquired repeatedly by the same thread could skip the CAS entirely, and check whether its own thread ID was already stamped in the mark word. Modern hardware’s uncontended CAS was fast enough that biased locking’s internal complexity cost more than it saved. That’s why it was disabled and deprecated in JDK 15 (JEP 374) and removed in 18 (JDK-8256425).

The monitor lifecycle

The JVM doesn’t reach immediately for an OS mutex when our code enters a synchronized block. Instead, it tries progressively heavier mechanisms, escalating only when necessary.

Thin locks (stack-based)

When a thread enters a synchronized block, the JVM takes the lightweight approach first.

It stamps the object’s mark word with a pointer back to a small lock record on the acquiring thread’s stack. Instead of reaching for an OS call or the kernel, a single atomic CPU instruction is enough to perform a CAS (compare-and-swap). If it succeeds, the thread owns the lock.

Re-entrancy, meaning the same thread locking the object again, is done by pushing an additional null lock record as a counter. Releasing the lock does the reverse, restoring the original mark word.

Furthermore, the JVM can aggressively optimise thin locks:

  • Lock coarsening: merging adjacent synchronized blocks on the same object.
  • Lock elision: removing the lock entirely if escape analysis proves the object never leaves the thread.

Inflated monitors (heavyweight)

When a second thread attempts to acquire the same lock while the first one is still holding it, it’s time to get out the big guns.

The thin lock is inflated into a full ObjectMonitor, a C++ struct on the JVM heap:

ObjectMonitor {
    _owner       // thread currently holding the lock
    _EntryList   // threads waiting to acquire
    _WaitSet     // threads in Object.wait()
    _recursions  // re-entrancy counter
}

Waiting threads are parked via an OS mutex plus a condition variable, meaning a real kernel call that suspends the thread and triggers a context switch.

That’s why contention is expensive, and not the locking itself. As soon as we need to involve the kernel, we have to pay the price for it.

Once a monitor is inflated, it typically stays that way for the object’s lifetime, although modern JDKs can deflate them. This behaviour is the explanation for the ballooning memory concern earlier, as the inflated monitors live in native memory, invisible to heap metrics we usually measure.

The Actual Bytecode

The compiler translates a synchronized block into explicit bytecode instructions.

A simple block like this:

java
synchronized (this.lock) {
    this.counter++;
}

translates roughly to:

aload_0         // push lock reference (simplified — field access omitted)
...
monitorenter    // acquire monitor
iinc    1, 1    // counter++
aload_2         // push lock reference again
monitorexit     // release monitor (happy path)
...
aload_2         // exception handler: push lock reference
monitorexit     // release monitor (bad path)
athrow          // rethrow

There are two monitorexit instructions: one for the normal exit path and one extra to handle exceptions.

This is the same guarantee ReentrantLock’s try/finally provides manually — release the lock on every path. With synchronized, the compiler generates both exits for us, making it impossible to forget the way you can forget ReentrantLock.unlock().


Know Your Tools

The synchronized keyword has been in Java since 1.0, and such longevity sometimes leads us developers to treat it as a legacy concept, especially given the richer concurrency tooling introduced in recent years. Replacing it wholesale with ReentrantLock or concurrent collections is the wrong conclusion to draw just by age.

The use-cases where synchronized shines and is the right choice are genuinely common:

  • Short critical section
  • Single shared field
  • Simple guard around a mutable collection

These cases are easy to break using the more complex locking mechanisms. No try/finally ceremony, and the JVM can aggressively optimise small synchronized blocks.

The fact that it can cause problems in certain scenarios doesn’t make it a bad tool from the start. What makes concurrent code go wrong isn’t the choice between synchronized and ReentrantLock et al. It’s a misunderstanding of what we’re trying to protect:

  • Synchronizing on the wrong object
  • Holding a lock across I/O we didn’t notice
  • Building two independent “correct” components that form a deadlock cycle when composed

Those are the things that lead to concurrent bugs, and they all stem from an incomplete mental model.

The next time you reach for synchronized, you know what it costs, what it risks, and when something else serves you better. That’s the difference between using a concurrency primitive and actually understanding one.


Resources