java.util.Locale to format dates, numbers, currency, and more.
But in some circumstances, these formatted strings have changed with JDK 9, leading to a multitude of subtle (and sometimes not so subtle) bugs.
What Are Locales?
To better understand a problem, we must first know what we’re dealing with.
“A locale is a set of parameters that defines the user`s language, region, and any special variant preferences that the user wants to see in her user interface.”
The core use of a locale is formatting many kinds of different data for output purposes:
- Date, time, weekdays, eras, etc.
- String collation
Locales are defined by a hierarchy of properties:
<language>[_<COUNTRY>[_<variant>]] language: ISO 639 (mandatory) COUNTRY: ISO 639 (optional) variant: any string (optional)
These three parameters are enough to define most combinations possible. Often a language is enough, but sometimes we need to be more specific. The variant is used to specify a locale even further:
Locale.GERMAN => de Locale.GERMANY => de_DE Locale.ENGLISH => en Locale.UK => en_GB Locale.CANADA => en_CA Locale.FRENCH => fr Locale.CANADA_FRENCH => fr_CA
And we’re not limited to the predefined set, custom locales are also supported:
Where Do Locales Come from?
A well-established source for locales is the “Unicode Common Locale Data Repository” (CLDR). It’s the world’s largest and most extensive repository of Locale data available and is used by operating systems, Google Chrome, MediaWiki, etc.
But just like with any standard, it’s not the only source for locales.
JDK Default Locale Provider
Java retrieves locales from the
Up until JDK 8, the default provider was
JRE, its own list of locale definitions.
JRE isn’t the only provider available in the JDK:
JRE(Java’s own list)
CLDR(Unicode Common Language Data Repository)
SPI(Service Provider Interface)
Host(provided by operation system)
What Changed With JDK 9?
JEP 252 changed the default provider from JRE to CLDR, starting with JDK 9. This change resulted in minuscule differences, which could cause big problems down the line.
These are the problems we’ve encountered (so far) upgrading to JDK 11.
Different date formats
The first thing that broke for us was (de-)serializing JSON data with Google’s GSON.
We use a custom-build sync-service for Mailchimp subscribers, persisting relevant data. And with JDK 11, we no longer could read previously persisted data.
After building some test scenarios, we identified a slight difference in the date format:
Before: Jul 15, 2020 2:20:32 AM After: Jul 15, 2020, 2:20:32 AM
Notice the additional comma after the year? Where did that suddenly come from?
It turns out,
CLDR defines some formats slightly different than the
JRE did before.
When does a Week start?
Another problem came from the first day of the week.
We’re using Apache Tapestry for most of our systems, with “language-only” locales for maximal compatibility.
This means we’re using
de) instead of
There wasn’t any problem before. Everything was formatted as it was supposed to look and what we were used to. And the first day of the week was Monday, as expected.
After our switch from JDK 8 to 11, our date pickers started on Sunday instead.
Debugging through the related code revealed that the locale was still
But the start day of the week was returned as Sunday.
A subtle difference between
CLDR and the
JRE provider is how the different kinds of formats relate to a locale’s specificity.
JRE provided Monday as the start day of the week for
Locale.GERMAN, even though it shouldn’t be!
CLDR doesn’t have a start day of the week for a “language-only” locale, simply because calendars shouldn’t be treated as language-dependent.
Instead, a region (or in more Java terms, a “country”) is responsible for defining calendars.
Including the start day of a week.
So our “language-only” German locale suddenly fell back to the default, country/territory 001: Sunday.
We have a simple unit test for currency formatting:
It ran fine before, but now fails because the currency format changed significantly:
JDK 8 | JDK 9+ ------------|----------------- ¤ 1.234,99 | 1.234,99\u00a0¤ ¤ = Unicode Currency Symbol Placeholder
Just like with the start of the week, the problem was the “language-only” locale. But this time, it’s the other way around.
CLDR does provide a default format for
de-based languages, with more specific patterns, e.g.,
On the other hand,
JRE didn’t have a pattern for
de-only, so it falls back to its default.
And to make it worse, a “no-break-space” is now being used to separate the value from the symbol.
If we had used
Locale.GERMANY instead of
Locale.GERMAN with JDK 8, we would have had almost the same result as with JDK 11.
How To Deal With the Changes
Now that we found multiple breaking changes in our formatting code, how should we deal with them?
Update our dependencies
Fixing our GSON problem was as easy as updating the dependency. The issue was fixed in 2.8.3, and we were using 2.8.2.
After some time in Maven dependency hell, (de-)serializing was working again flawlessly.
Adapt to the changes
If possible, we should embrace the changes in our code as much as possible.
The locale definitions by
CLDR are more accurate and thought-out than before.
Using a well-defined standard used by many different systems, we can ensure better interoperability, and it’s more futureproof. If necessary, we could still wrap non-fixable code in a compatibility wrapper of our own, or provide different date formats ourselves.
In the world of software development, being technically correct is sometimes just not possible. Think of legacy code, irreplaceable dependencies, closed systems we have to interact with, etc. The list of reasons why we can’t merely adapt to specific changes is long.
But the JDK has our back if we can’t adapt to them: the system property
java -Djava.locale.providers=<order of providers>Available providers: - CLDR (default) - HOST (OS provided) - SPI (Service Provider Interface) - COMPAT (formerly JRE) - JRE (alternative to JRE, but disfavored over COMPAT)
By specifying the lookup order of locale providers, we can get the same results as before without changing a single line of code. This might be the quickest fix available, but relying on a compatibility mode will most likely become a problem.
Looking at the bugs in
JDK-8145136, and their related sub-reports, it seems that the risk and impact of switching the default provider weren’t classified as high as they should have been.
Actually, that’s actually quite understandable.
Depending on your used locales, you might not even be affected by any problems.
In my opinion, we should try to adapt to the changes as much as possible or any fix becomes technical debt in the long run. JDK 9 was a massive release, with many breaking changes. So it’s unlikely everything will work out-of-the-box in a bigger project. Fixing locale-based formats should definitively on our to-do list.