We use java.util.Locale to format dates, numbers, currency, and more. But in some circumstances, these formatted strings have changed with JDK 9, leading to a multitude of subtle (and sometimes not so subtle) bugs.


What Are Locales?

To better understand a problem, we must first know what we’re dealing with.

“A locale is a set of parameters that defines the user`s language, region, and any special variant preferences that the user wants to see in her user interface.”
Wikipedia

The core use of a locale is formatting many kinds of different data for output purposes:

  • Numbers
  • Date, time, weekdays, eras, etc.
  • Currency
  • String collation
  • etc.

Locales are defined by a hierarchy of properties:

<language>[_<COUNTRY>[_<variant>]]

language:  ISO 639 (mandatory)  
COUNTRY:   ISO 639 (optional)  
variant:   any string (optional)

These three parameters are enough to define most combinations possible. Often a language is enough, but sometimes we need to be more specific. The variant is used to specify a locale even further:

Locale.GERMAN        => de
Locale.GERMANY       => de_DE

Locale.ENGLISH       => en
Locale.UK            => en_GB
Locale.CANADA        => en_CA

Locale.FRENCH        => fr
Locale.CANADA_FRENCH => fr_CA

And we’re not limited to the predefined set, custom locales are also supported:

1
2
3
4
5
6
// Custom locale for my home town
var customLocale =
    new Locale.Builder()
              .setLanguage("de")
              .setCountry("DE")
              .setVariant("Heidelberg");

Where Do Locales Come from?

A well-established source for locales is the “Unicode Common Locale Data Repository” (CLDR). It’s the world’s largest and most extensive repository of Locale data available and is used by operating systems, Google Chrome, MediaWiki, etc.

But just like with any standard, it’s not the only source for locales.

XKCD: Standard — https://xkcd.com/927/

JDK Default Locale Provider

Java retrieves locales from the LocaleProviderAdapter. Up until JDK 8, the default provider was JRE, its own list of locale definitions.

JRE isn’t the only provider available in the JDK:

  • JRE (Java’s own list)
  • CLDR (Unicode Common Language Data Repository)
  • SPI (Service Provider Interface)
  • Host (provided by operation system)

What Changed With JDK 9?

JEP 252 changed the default provider from JRE to CLDR, starting with JDK 9. This change resulted in minuscule differences, which could cause big problems down the line.

These are the problems we’ve encountered (so far) upgrading to JDK 11.

Different date formats

The first thing that broke for us was (de-)serializing JSON data with Google’s GSON.

We use a custom-build sync-service for Mailchimp subscribers, persisting relevant data. And with JDK 11, we no longer could read previously persisted data.

After building some test scenarios, we identified a slight difference in the date format:

Before: Jul 15, 2020 2:20:32 AM  
After:  Jul 15, 2020, 2:20:32 AM

Notice the additional comma after the year? Where did that suddenly come from?

It turns out, CLDR defines some formats slightly different than the JRE did before.

When does a Week start?

Another problem came from the first day of the week.

We’re using Apache Tapestry for most of our systems, with “language-only” locales for maximal compatibility. This means we’re using Locale.GERMAN (de) instead of Locale.GERMANY (de_DE).

There wasn’t any problem before. Everything was formatted as it was supposed to look and what we were used to. And the first day of the week was Monday, as expected.

After our switch from JDK 8 to 11, our date pickers started on Sunday instead. Debugging through the related code revealed that the locale was still Locale.GERMAN. But the start day of the week was returned as Sunday.

What happened?

A subtle difference between CLDR and the JRE provider is how the different kinds of formats relate to a locale’s specificity.

JRE provided Monday as the start day of the week for Locale.GERMAN, even though it shouldn’t be!

The CLDR doesn’t have a start day of the week for a “language-only” locale, simply because calendars shouldn’t be treated as language-dependent. Instead, a region (or in more Java terms, a “country”) is responsible for defining calendars.

Including the start day of a week.

So our “language-only” German locale suddenly fell back to the default, country/territory 001: Sunday.

Currency formats

We have a simple unit test for currency formatting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@Test
@DataProvider(name = "currencies")
void testCurrencyFormat(String currencyCode,
                        Locale locale,
                        BigDecimal value,
                        String expectedResult) {

    // ARRANGE
    var format = NumberFormat.getCurrencyInstance(locale);
    var currency = Currency.getInstance(currencyCode);
    format.setCurrency(currency);

    // ACT
    String result = format.format(value);

    // ASSERT
    Assert.assertEquals(result, expectedResult);
}

@DataProvider(name = "currencies")
Iterator<Object[]> provideTestData() {
    var value = new BigDecimal("1234.99");
    var locale = Locale.GERMAN;
    return List.of(new Object[] { "EUR", locale, value, "€ 1.234,99" },
                   new Object[] { "CHF", locale, value, "CHF 1.234,99" },
                   new Object[] { "USD", locale, value, "$ 1.234,99" },
                   new Object[] { "JPY", locale, value, "¥ 1.234,99" })
                .iterator();
}

It ran fine before, but now fails because the currency format changed significantly:

 JDK 8      | JDK 9+
------------|----------------- 
 ¤ 1.234,99 | 1.234,99\u00a0¤

 ¤ = Unicode Currency Symbol Placeholder

Just like with the start of the week, the problem was the “language-only” locale. But this time, it’s the other way around.

The CLDR does provide a default format for de-based languages, with more specific patterns, e.g., de_AT.

On the other hand, JRE didn’t have a pattern for de-only, so it falls back to its default.

And to make it worse, a “no-break-space” is now being used to separate the value from the symbol.

If we had used Locale.GERMANY instead of Locale.GERMAN with JDK 8, we would have had almost the same result as with JDK 11.


How To Deal With the Changes

Now that we found multiple breaking changes in our formatting code, how should we deal with them?

Update our dependencies

Fixing our GSON problem was as easy as updating the dependency. The issue was fixed in 2.8.3, and we were using 2.8.2.

After some time in Maven dependency hell, (de-)serializing was working again flawlessly.

Adapt to the changes

If possible, we should embrace the changes in our code as much as possible. The locale definitions by CLDR are more accurate and thought-out than before.

Using a well-defined standard used by many different systems, we can ensure better interoperability, and it’s more futureproof. If necessary, we could still wrap non-fixable code in a compatibility wrapper of our own, or provide different date formats ourselves.

Compatibility mode

In the world of software development, being technically correct is sometimes just not possible. Think of legacy code, irreplaceable dependencies, closed systems we have to interact with, etc. The list of reasons why we can’t merely adapt to specific changes is long.

But the JDK has our back if we can’t adapt to them: the system property java.locale.providers:

java -Djava.locale.providers=<order of providers>Available providers:

- CLDR (default)
- HOST (OS provided)
- SPI (Service Provider Interface)
- COMPAT (formerly JRE)
- JRE (alternative to JRE, but disfavored over COMPAT)

By specifying the lookup order of locale providers, we can get the same results as before without changing a single line of code. This might be the quickest fix available, but relying on a compatibility mode will most likely become a problem.


Conclusion

Looking at the bugs in JDK-8008577, JDK-8145136, and their related sub-reports, it seems that the risk and impact of switching the default provider weren’t classified as high as they should have been.

Actually, that’s actually quite understandable.

Depending on your used locales, you might not even be affected by any problems.

In my opinion, we should try to adapt to the changes as much as possible or any fix becomes technical debt in the long run. JDK 9 was a massive release, with many breaking changes. So it’s unlikely everything will work out-of-the-box in a bigger project. Fixing locale-based formats should definitively on our to-do list.

Resources