Internationalization

Users expect our software to speak their language. Unfortunately for us, there will likely be more than one language involved. While doing simple string replacement isn’t too involved, correctly dealing with all the grammar issues can be tricky. After all, who wants to see "List 1 file(s)" from a program output?

But a real i18n solution needs to do more than just provide a means of achieving the correct output. It needs to make this process easy for both the programmer and the translator and relatively error-proof. Yesod’s answer to the problem gives you:

Intelligent guessing of the user’s desired language based on request headers, with the ability to override.
A simple syntax for giving translations which requires no Haskell knowledge. (After all, most translators aren’t programmers.)
The ability to bring in the full power of Haskell for tricky grammar issues as necessary, along with a default selection of helper functions to cover most needs.
Absolutely no issues at all with word order.

Synopsis

-- @messages/en.msg
Hello: Hello
EnterItemCount: I would like to buy:
Purchase: Purchase
ItemCount count@Int: You have purchased #{showInt count} #{plural count "item" "items"}.
SwitchLanguage: Switch language to:
Switch: Switch

-- @messages/he.msg
Hello: שלום
EnterItemCount: אני רוצה לקנות:
Purchase: קנה
ItemCount count: קנית #{showInt count} #{plural count "דבר" "דברים"}.
SwitchLanguage: החלף שפה ל:
Switch: החלף

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings     #-}
{-# LANGUAGE QuasiQuotes           #-}
{-# LANGUAGE TemplateHaskell       #-}
{-# LANGUAGE TypeFamilies          #-}
import           Yesod

data App = App

mkMessage "App" "messages" "en"

plural :: Int -> String -> String -> String
plural 1 x _ = x
plural _ _ y = y

showInt :: Int -> String
showInt = show

instance Yesod App

instance RenderMessage App FormMessage where
    renderMessage _ _ = defaultFormMessage

mkYesod "App" [parseRoutes|
/     HomeR GET
/buy  BuyR  GET
/lang LangR POST
|]

getHomeR :: Handler Html
getHomeR = defaultLayout
    [whamlet|
        <h1>_{MsgHello}
        <form action=@{BuyR}>
            _{MsgEnterItemCount}
            <input type=text name=count>
            <input type=submit value=_{MsgPurchase}>
        <form action=@{LangR} method=post>
            _{MsgSwitchLanguage}
            <select name=lang>
                <option value=en>English
                <option value=he>Hebrew
            <input type=submit value=_{MsgSwitch}>
    |]

getBuyR :: Handler Html
getBuyR = do
    count <- runInputGet $ ireq intField "count"
    defaultLayout [whamlet|<p>_{MsgItemCount count}|]

postLangR :: Handler ()
postLangR = do
    lang <- runInputPost $ ireq textField "lang"
    setLanguage lang
    redirect HomeR

main :: IO ()
main = warp 3000 App

Overview

Most existing i18n solutions out there, like gettext or Java message bundles, work on the principle of string lookups. Usually some form of printf-interpolation is used to interpolate variables into the strings. In Yesod, as you might guess, we instead rely on types. This gives us all of our normal advantages, such as the compiler automatically catching mistakes.

Let’s take a concrete example. Suppose our application has two things it wants to say to a user: say hello, and state how many users are logged into the system. This can be modeled with a sum type:

data MyMessage = MsgHello | MsgUsersLoggedIn Int

I can also write a function to turn this datatype into an English representation:

toEnglish :: MyMessage -> String
toEnglish MsgHello = "Hello there!"
toEnglish (MsgUsersLoggedIn 1) = "There is 1 user logged in."
toEnglish (MsgUsersLoggedIn i) = "There are " ++ show i ++ " users logged in."

We can also write similar functions for other languages. The advantage to this inside-Haskell approach is that we have the full power of Haskell for addressing tricky grammar issues, especially pluralization.

The downside, however, is that you have to write all of this inside of Haskell, which won’t be very translator-friendly. To solve this, Yesod introduces the concept of message files. We’ll cover that in a little bit.

Assuming we have this full set of translation functions, how do we go about using them? What we need is a new function to wrap them all up together, and then choose the appropriate translation function based on the user’s selected language. Once we have that, Yesod can automatically choose the most relevant render function and call it on the values you provide.

In order to simplify things a bit, Hamlet has a special interpolation syntax, _{…}, which handles all the calls to the render functions. And in order to associate a render function with your application, you use the YesodMessage typeclass.

Message files

The simplest approach to creating translations is via message files. The setup is simple: there is a single folder containing all of your translation files, with a single file for each language. Each file is named based on its language code, e.g. en.msg. And each line in a file handles one phrase, which correlates to a single constructor in your message data type.

So firstly, a word about language codes. There are really two choices available: using a two-letter language code, or a language-LOCALE code. For example, when I load up a page in my web browser, it sends two language codes: en-US and en. What my browser is saying is "if you have American English, I like that the most. If you have English, I’ll take that instead."

So which format should you use in your application? Most likely two-letter codes, unless you are actually creating separate translations by locale. This ensures that someone asking for Canadian English will still see your English. Behind the scenes, Yesod will add the two-letter codes where relevant. For example, suppose a user has the following language list:

pt-BR, es, he

What this means is "I like Brazilian Portuguese, then Spanish, and then Hebrew." Suppose your application provides the languages pt (general Portuguese) and English, with English as the default. Strictly following the user’s language list would result in the user being served English. Instead, Yesod translates that list into:

pt-BR, es, he, pt

In other words: unless you’re giving different translations based on locale, just stick to the two-letter language codes.

Now what about these message files? The syntax should be very familiar after your work with Hamlet and Persistent. The line starts off with the name of the message. Since this is a data constructor, it must start with a capital letter. Next, you can have individual parameters, which must be given as lower case. These will be arguments to the data constructor.

The argument list is terminated by a colon, and then followed by the translated string, which allows usage of our typical variable interpolation syntax translation helper functions to deal with issues like pluralization, you can create all the translated messages you need.

Specifying types

Since we will be creating a datatype out of our message specifications, each parameter to a data constructor must be given a data type. We use a @-syntax for this. For example, to create the datatype data MyMessage = MsgHello | MsgSayAge Int, we would write:

Hello: Hi there!
SayAge age@Int: Your age is: #{show age}

But there are two problems with this:

It’s not very DRY (don’t repeat yourself) to have to specify this datatype in every file.
Translators will be confused having to specify these datatypes.

So instead, the type specification is only required in the main language file. This is specified as the third argument in the mkMessage function. This also specifies what the backup language will be, to be used when none of the languages provided by your application match the user’s language list.

RenderMessage typeclass

Your call to mkMessage creates an instance of the RenderMessage typeclass, which is the core of Yesod’s i18n. It is defined as:

class RenderMessage master message where
    renderMessage :: master
                  -> [Text] -- ^ languages
                  -> message
                  -> Text

Notice that there are two parameters to the RenderMessage class: the master site and the message type. In theory, we could skip the master type here, but that would mean that every site would need to have the same set of translations for each message type. When it comes to shared libraries like forms, that would not be a workable solution.

The renderMessage function takes a parameter for each of the class’s type parameters: master and message. The extra parameter is a list of languages the user will accept, in descending order of priority. The method then returns a user-ready Text that can be displayed.

A simple instance of RenderMessage may involve no actual translation of strings; instead, it will just display the same value for every language. For example:

data MyMessage = Hello | Greet Text
instance RenderMessage MyApp MyMessage where
    renderMessage _ _ Hello = "Hello"
    renderMessage _ _ (Greet name) = "Welcome, " <> name <> "!"

Notice how we ignore the first two parameters to renderMessage. We can now extend this to support multiple languages:

renderEn Hello = "Hello"
renderEn (Greet name) = "Welcome, " <> name <> "!"
renderHe Hello = "שלום"
renderHe (Greet name) = "ברוכים הבאים, " <> name <> "!"
instance RenderMessage MyApp MyMessage where
    renderMessage _ ("en":_) = renderEn
    renderMessage _ ("he":_) = renderHe
    renderMessage master (_:langs) = renderMessage master langs
    renderMessage _ [] = renderEn

The idea here is fairly straight-forward: we define helper functions to support each language. We then add a clause to catch each of those languages in the renderMessage definition. We then have two final cases: if no languages matched, continue checking with the next language in the user’s priority list. If we’ve exhausted all languages the user specified, then use the default language (in our case, English).

But odds are that you will never need to worry about writing this stuff manually, as the message file interface does all this for you. But it’s always a good idea to have an understanding of what’s going on under the surface.

Interpolation

One way to use your new RenderMessage instance would be to directly call the renderMessage function. This would work, but it’s a bit tedious: you need to pass in the foundation value and the language list manually. Instead, Hamlet provides a specialized i18n interpolation, which looks like _{…}.

Hamlet will then automatically translate that to a call to renderMessage. Once Hamlet gets the output Text value, it uses the toHtml function to produce an Html value, meaning that any special characters (<, &, >) will be automatically escaped.

Phrases, not words

As a final note, I’d just like to give some general i18n advice. Let’s say you have an application for selling turtles. You’re going to use the word "turtle" in multiple places, like "You have added 4 turtles to your cart." and "You have purchased 4 turtles, congratulations!" As a programmer, you’ll immediately notice the code reuse potential: we have the phrase "4 turtles" twice. So you might structure your message file as:

AddStart: You have added
AddEnd: to your cart.
PurchaseStart: You have purchased
PurchaseEnd: , congratulations!
Turtles count@Int: #{show count} #{plural count "turtle" "turtles"}

STOP RIGHT THERE! This is all well and good from a programming perspective, but translations are not programming. There are a many things that could go wrong with this, such as:

Some languages might put "to your cart" before "You have added."
Maybe "added" will be constructed differently depending on whether you added 1 or more turtles.
There are a bunch of whitespace issues as well.

So the general rule is: translate entire phrases, not just words.