Localization Changes in Java 9
We use java.util.Locale
to format dates, numbers, currency, and more.
But in some circumstances, these formatted strings have changed with JDK 9, leading to a multitude of subtle (and sometimes not so subtle) bugs.
Table of Contents
What Are Locales?
To better understand a problem, we must first know what we’re dealing with.
“A locale is a set of parameters that defines the user`s language, region, and any special variant preferences that the user wants to see in her user interface.”
— Wikipedia
The core use of a locale is formatting many kinds of different data for output purposes:
- Numbers
- Date, time, weekdays, eras, etc.
- Currency
- String collation
- etc.
Locales are defined by a hierarchy of properties:
These three parameters are enough to define most combinations possible. Often a language is enough, but sometimes we need to be more specific. The variant is used to specify a locale even further:
And we’re not limited to the predefined set, custom locales are also supported:
Where Do Locales Come from?
A well-established source for locales is the “Unicode Common Locale Data Repository” (CLDR). It’s the world’s largest and most extensive repository of Locale data available and is used by operating systems, Google Chrome, MediaWiki, etc.
But just like with any standard, it’s not the only source for locales.
JDK Default Locale Provider
Java retrieves locales from the LocaleProviderAdapter
.
Up until JDK 8, the default provider was JRE
, its own list of locale definitions.
The JRE
option isn’t the only provider available in the JDK:
JRE
- Java’s own list
CLDR
- (Unicode Common Language Data Repository)
SPI
- (Service Provider Interface)
Host
- (provided by operation system)
What Changed With JDK 9?
JEP 252 changed the default provider from JRE to CLDR, starting with JDK 9. This change resulted in minuscule differences, which could cause big problems down the line.
These are the problems we’ve encountered (so far) upgrading to JDK 11.
Different date formats
The first thing that broke for us was (de-)serializing JSON data with Google’s GSON.
We use a custom-build sync-service for Mailchimp subscribers, persisting relevant data. And with JDK 11, we no longer could read previously persisted data.
After building some test scenarios, we identified a slight difference in the date format:
Notice the additional comma after the year? Where did that suddenly come from?
It turns out, CLDR
defines some formats slightly differently than the JRE
did before.
When does a Week start?
Another problem came from the first day of the week.
We’re using Apache Tapestry for most of our systems, with “language-only” locales for maximal compatibility.
This means we’re using Locale.GERMAN
(de
) instead of Locale.GERMANY
(de_DE
).
There wasn’t any problem before. Everything was formatted as it was supposed to look and what we were used to. And the first day of the week was Monday, as expected.
After our switch from JDK 8 to 11, our date pickers started on Sunday instead.
Debugging through the related code revealed that the locale was still Locale.GERMAN
.
But the start day of the week was returned as Sunday.
What happened?
A subtle difference between CLDR
and the JRE
provider is how the different kinds of formats relate to a locale’s specificity.
JRE
provided Monday as the start day of the week for Locale.GERMAN
, even though it shouldn’t be!
The CLDR
doesn’t have a start day of the week for a “language-only” locale, simply because calendars shouldn’t be treated as language-dependent.
Instead, a region (or in more Java terms, a “country”) is responsible for defining calendars.
Including the start day of the week.
So our “language-only” German locale suddenly fell back to the default, country/territory 001: Sunday.
Currency formats
We have a simple unit test for currency formatting:
It ran fine before, but now fails because the currency format changed significantly:
Just like with the start of the week, the problem was the “language-only” locale. But this time, it’s the other way around.
The CLDR
does provide a default format for de
-based languages, with more specific patterns, e.g., de_AT
.
On the other hand, JRE
didn’t have a pattern for de
-only, so it falls back to its default.
And to make it worse, a “no-break-space” is now being used to separate the value from the symbol.
If we had used Locale.GERMANY
instead of Locale.GERMAN
with JDK 8, we would have had almost the same result as with JDK 11.
How To Deal With the Changes
Now that we found multiple breaking changes in our formatting code, how should we deal with them?
Update our dependencies
Fixing our GSON problem was as easy as updating the dependency. The issue was fixed in 2.8.3, and we were using 2.8.2.
After some time in Maven dependency hell, (de-)serializing was working again flawlessly.
Adapt to the changes
If possible, we should embrace the changes in our code as much as possible.
The locale definitions by CLDR
are more accurate and thought-out than before.
Using a well-defined standard used by many different systems, we can ensure better interoperability, and it’s more futureproof. If necessary, we could still wrap non-fixable code in a compatibility wrapper of our own, or provide different date formats ourselves.
Compatibility mode
In the world of software development, being technically correct is sometimes just not possible. Think of legacy code, irreplaceable dependencies, closed systems we have to interact with, etc. The list of reasons why we can’t merely adapt to specific changes is long.
But the JDK has our back if we can’t adapt to them: the system property java.locale.providers
:
By specifying the lookup order of locale providers, we can get the same results as before without changing a single line of code. This might be the quickest fix available, but relying on a compatibility mode will most likely become a problem.
Conclusion
Looking at the bugs in JDK-8008577
, JDK-8145136
, and their related sub-reports, it seems that the risk and impact of switching the default provider weren’t classified as high as they should have been.
Actually, that’s actually quite understandable.
Depending on your used locales, you might not even be affected by any problems.
In my opinion, we should try to adapt to the changes as much as possible or any fix becomes technical debt in the long run. JDK 9 was a massive release, with many breaking changes. So it’s unlikely everything will work out-of-the-box in a bigger project. Fixing locale-based formats should be definitively on our to-do list.
Resources
- Locale (computer software) (Wikipedia)
- JDK 9 Release Notes (Oracle)
- JEP 252: Use CLDR Locale Data by Default (OpenJDK)
- Internationalization Enhancements in JDK 9 (Oracle)
- Unicode Common Locale Data Repository
- ISO 639 (Wikipedia)
- ISO 15924 (Wikipedia)
- ISO 3166 (Wikipedia)
- UN M.49 (Wikipedia)