Putting Zürich before Århus
By Mark Davis, International Software Architect
Until now, it has been very difficult for web application designers to do something as simple
as sort names correctly according to the user's language. And it matters: English readers
wouldn’t expect Århus to sort below Zürich, but Danish speakers would.
Because linguistic sorting requires a sophisticated algorithm and
lots of data, it was impractical to do this natively in JavaScript. Until now, the only full
solution for sorting on the client side was to generate on a server a sortKey for every string
that needed to be sorted, and send the sortkeys — base64-encoded — down to the client along
with the strings. Pretty ugly! And what’s doubly frustrating is that the underlying operating
systems have all been able to handle this, whether through
International Components for Unicode (ICU) or
Windows APIs.
The new internationalization specification for ECMAScript (the “official” name for JavaScript)
changes this picture. It is already in the production version of Chrome, and is on track for
other major browsers.
Linguistic sorting is not the only benefit—not only will users be able to see names sorted
correctly, but also correct numeric values (“1,234.56” in English, but “1.234,56” in German),
dates (“March 10, 2012” vs “10. März 2012”), and so on. While the results might not be
precisely the same in every browser, they should be appropriate to the language, and are
returned using a uniform API.
On any enabled browser — in its supported languages — web application developers can:
- compare strings correctly: choosing whether or not to
ignore accents, case differences, etc.
- format numbers correctly: choosing decimal places, currencies,
whether to use thousands-separator, etc.
- format dates and times correctly: choosing decimal places,
numeric vs named months, etc.
- match locales: comparing the user’s desired locales (say Arabic
and French) against the supported locales (say French, German, and English), to get the best
match.
The API also allows for linguistic support in offline web applications, which
wasn’t practical before. It builds on the industry standards
BCP47 (for identifying languages and
locales) and
LDML (part of the
Unicode Common Locale Data Repository (CLDR)
project). For the gory details of the spec, see
ECMA-402:
ECMAScript Internationalization API Specification (just approved by the Ecma General
Assembly).
Mark
Davis is president and cofounder of the Unicode consortium, and founder of ICU and
CLDR. Mark is fond of food, film, travel, and RPGs. Mark lived for 4 years in Switzerland, and
is moving back in February.
Posted by Scott Knaster,
Editor