bdunagan

fill the void - bdunagan

01 Sep 2014
Rails internationalization at scale

Retrospect.com is around 400 pages, including the main site, knowledge base, blog, and documentation. Our software supports six languages–English, German, Spanish, French, Italian, and Japanese–and the site does as well. We wrote it in Rails three years ago, but the initial version–then 200 pages–had separate pages per locale, using the page.en.html.erb format. Any HTML change needed to be duplicated across six files, and localizers introduced the occasional HTML bug.

When the site began to grow, separate language files became unsustainable. But with no prior experience, the best solution was not obvious to me. I’ll go through what failed and what scaled.


Naive translate

I started very simple:

# html.erb
<%= t("I wrapped t() around every phrase. That didn't work.") %>
# html
"That Didn't Work"

Yes, the completely naive approach. I just wrapped t() around everything. t() is shorthand for I18n.translate(), and I18n is the Rails gem for internationalization. Wrapping a phrase works; wrapping a paragraph does not. The difference is a paragraph includes periods, and I18n considers periods to be delimiters. t() treats the phrase argument as a key, and periods scope that key. t() looks up the key in config/locales/*.yml

Here is a simple example: localizing a welcome message.

# en.yml
en:
  homepage:
    welcome: "This is the welcome message."

# es.yml
es:
  homepage:
    welcome: "Este es el mensaje de bienvenida."
# html.erb
<%= t("homepage.welcome") %>
# html for :en locale
"This is the welcome message."
# html for :es locale
"Este es el mensaje de bienvenida."

In this example, t() looked up “homepage.welcome” in en.yml. It found the “homepage” block then the “welcome” key and returned its value. Changing the locale changes the YAML file used during lookup.

Back to the original example, wrapping phrases with t() does not scale because of the period delimiters. “That Didn’t Work” appears in the HTML because Rails looks up “I wrapped t() around every phrase” then “ That didn’t work” then “”. None exist in en.yml, but Rails has a coping strategy to make me look a little less stupid: capitalization. It ignored the final empty string, takes the final non-blank part of the period-delimited key, and returns the capitalized version of it. Hence “That Didn’t Work”.

That was my issue. I needed a solution beyond Rails’ toy examples, something that scaled to thousands of strings across hundreds of pages. I did experiment with changing default_separator for I18n, but that hack didn’t resolve the consolidation issue. I needed all the strings from the localized pages to be values in a lookup.


Scale with YAMLs

Of course, there was no simple solution. For a very tedious week, I migrated 10k strings into their respective localized YAML files across hundreds of pages. For each, I created a delimited key based on their page, like “products.win.features.platform_support”.

However, I did not use a single locale file. Instead, I created a folder structure under config/locales to separate the sections of the website, like so:

config
|--locales
|----blog
|------blog.en.yml
|------blog.es.yml
|----products
|------win.en.yml
|------win.es.yml

Thanks to a helpful tip, I added the following line to application.rb to ensure Rails read in the entire folder hierarchy:

config.i18n.load_path += 
  Dir[Rails.root.join('config', 'locales', '**/*.{rb,yml}').to_s]

With all of the localizations stored in config/locales, I could reduce the site to only one file per page. Site changes went from complicated six-fold operations to easy tweaks.

Presentation and content were separate.


Localization with Gengo

After switching to YAML files, localization was straight-forward as well. We used to send our webpages to a large localization service. The translations would take weeks, cost quite a bit, and often introduce HTML bugs. With the site content in YAML files, I switched us to Gengo.

Gengo translates content within a few hours at low cost. The highest rate it charges is still a fourth of the cost of our previous localization service, and their turnaround is within the day. Gengo doesn’t support YAML files explicitly, but it allows text to be excluded from translation via “[[[three brackets]]]”. I wrote a short Ruby script to reformat our YAML files into a form they accept and translate the outcome back.

# config/locales/products/win.yml
products:
  win:
    features:
      platform: "Text about platform support."

# gengo submission
[[[products.win.features.platform]]] Text about platform support.

Gengo is simply amazing. Compared to our previous service, it has saved us thousands of dollars and months of waiting. Other translation services, like Transifex and PhraseApp, use Gengo as their backend to professional translators. I highly recommend them. Getting a string translated for a couple dollars in an hour by a professional sounded absurd to us before we discovered Gengo.


Routes for locales

Consolidation is solved. Localization is solved. But we still need each site visitor to get the right language for a given page. The correct language requires the correct locale.

We chose a simple javascript-driven dropdown at the top right of each page in the header. A visitor can quickly switch to their own locale when they arrive at the site. More importantly, the locale sticks. (Sounds obvious, but many sites we’ve seen fail at this, quickly reverting back to English while we navigate around.)

Let’s walk through how a German would navigate the site:

  1. Google leads them to http://retrospect.com.
  2. They want to switch language and see the world icon at the top.
  3. They hover over it, see the menu, and select “Deutsch”.
  4. They’re redirected to http://retrospect.com/de.
  5. They click on “Mac” and go to http://retrospect.com/de/products/mac.

We could have stored the locale in their cookie, but we preferred making it an explicit part of the URL. Shared links retain their locale, and testing is simpler. To create links for every locale, we use the following code:

# routes.rb
scope '(:locale)', :locale => /(de|en|es|fr|it|ja)(_[A-Z]{2}){0,1}/ do
  # Add your routes here.
end

Like I said above, the locale stays with the visitor. We originally included :locale => I18n.locale in all of our URL/path calls, but that didn’t scale well as the site grew. Instead, we added the following code to include that key/value pair automatically in every URL:

# application_controller.rb
def default_url_options(options={})
  { :locale => I18n.locale }
end

# We use this method on Rails 3.2, but it seems to be deprecated.
# There are other equivalent methods regardless.

Rails I18n hints

We encountered a number of other issues during this process.

Safe HTML: For security reasons, Rails doesn’t render HTML in text unless explicitly told to by text.html_safe. However, t() supports “_html” appended to a given key in the YAML file for its value to allow HTML, so “<strong>this</strong>” becomes “this” for “homepage.this_html”.

Locale fallback: I18n has a configuration for falling back to the default language: config.i18n.fallback in application.rb. (Use config.i18n.default_locale to define the default language.) If a translation for a certain language is missing, t() looks up the same key in the default locale and uses it, so the site never looks wrong, just untranslated. (And like I mentioned in the first example, if t() fails to find the key in the default locale as well, it capitalizes the final key and returns that.)

Language-specific CSS: Phrase length can vary quite a bit between languages. Sometimes, every language works with a certain layout except one. To accommodate different lengths, we reduce the size of the longer languages in CSS.

.platform_features { :lang(de) {font-size:90%;} }

Translation management

The final issue we had with this approach was overall translation management. With 3k strings, we inevitably forgot to translate strings from the default locale. When we did, I18n wrapped the missing translations in a span with the class translation_missing.

However, this mechanism trusted our testing to identify missing keys. Moreover, members of our team fluent in other languages (like our European sales team) were not always satisfied with the translations on the site, but there was no place to see all the translations for their respective language.

To solve both problems, I wrote some Ruby code to collect every key, identify those with missing translations, and display both sets on a special page of the website. (Thanks to koppen/i18n_missing_keys for the core logic, although glebm/i18n-tasks is a good up-to-date alternative.)

The end result is a page at /missing_keys that lists every key in English with their translation and every untranslated key with what languages are missing:

# every key (with :es locale)
"This is the welcome message.": "Este es el mensaje de bienvenida."

# every untranslated key
homepage.welcome [de,fr,it,ja]: This is the welcome message.

The workflow is comprised of three pieces:

  • missing_keys.rb: the model object that reads in the YAML files, ignores the keys listed in ignore_missing_keys.yml, and identifies what keys are missing from what languages
  • home_controller.rb: controller logic for a route points to, /missing_keys in this case, that saves off all the keys and the missing keys
  • missing_keys.html.erb: a page on the site to list all the keys on the site with a section for the missing keys and each language they’re missing from

Below are the three blocks of code.

A small trick in missing_keys.rb is overriding the fallback language. To display the page in a production environment, we enable a default fallback, but I need to override that choice for finding missing keys.

# Override fallback.
I18n.translate(key, :locale => locale, :fallback => [], :raise => true)

Mixing strategies

Not all of our pages are fully localized with t(). When we converted our user guides from PageMaker and InDesign to HTML in 2013, consolidating the languages was too daunting, and we left them as separate language-specific HTML files (i.e. chapter1.en.html.erb). However, we added t() calls at the top to specify their titles.

We use both approaches together.


Shipping today

As I said at the beginning, Retrospect.com is around 400 pages today with 3,000 strings across 80 YAML files. Using Rails and its I18n gem allows us to extract our localization out of the HTML and into YAML files for simple site changes and easy translation, and a bit of custom code allows us to manage the workflow.

Migrating 10k strings was incredibly tedious though.

Previous LinkedIn Twitter GitHub Email Next