Main menu

Managing pluralized translations

This page explains how Loco handles plural forms and how to prepare them for different platforms. If you already have a good understanding of pluralization on your platform, you might want to skip the introduction.

Introduction to plurals

Many languages have two grammatical forms for singular and plural phrases. (e.g. "1 file" vs "2 files"). However, some languages have several forms (e.g. Polish & Russian); some have only one (e.g. Chinese & Japanese). Arabic has the most with six different plural forms.

If you have phrases containing an arbitrary quantity you'll need a translation for every form that your language uses. This will usually contain a placeholder for the quantity value, so our example becomes "1 file" and "%d files" (where %d would be replaced with a number).

Given the required quantity, your localization framework has the job of selecting the most appropriate option from this list. Precisely how it achieves this may be hidden from you, but you'll need to know something about its language rules so you can prepare the correct translations. See your platform documentation for the rules it uses.

Plural rules

Loco uses Unicode's six plural categories. Instead of just single and plural, we have: "zero", "one", "two", "few", "many" and "other".

  • All languages use an "other" form as their final default.
  • Note that "zero", "one" and "two" may not mean precisely 0, 1 and 2. For example "two" might mean a quantity ending in 2, like 102, 202, 302 and so on.

The first thing to do within Loco is check that your language is configured with the same rules that your platform is expecting. From the project management screen, hover over a language and click the :cog icon: to edit its properties. Below is what Loco's built-in rules will look like for Polish:

img

We'll use Polish as an example throughout this page. The additional "few" form is used for numbers ending in 2, 3 or 4 (but not 12, 13 or 14).

Ignore the cumbersome formula for the moment; it's not needed on most platforms and we'll come to it later. The most important thing is that the plural forms are correct and in the right order for your application.

Please note that these are rules for cardinal plural forms (used for quantities). Loco doesn't currently support separate rules for ordinal forms or ordinal ranges. See limitations.

Ordered forms

The simplest method used by many platforms is to map plural forms to a simple series of options. To illustrate the concept, here's what a pluralized message might look like for English and Polish as JSON arrays:

{
  "fileCount": [ "1 file", "%d files" ] 
}
{
  "fileCount": [ "1 plik", "%d pliki", "%d plików" ] 
}

Other platforms use variations on the array principle. Here's the same example in the more verbose syntax of Gettext PO files. Note that English and Polish are combined into the same file. The Gettext format is limited to source languages with exactly two forms, but the target has no limit:

msgid "1 file"
msgid_plural "%d files"
msgstr[0] "1 plik"
msgstr[1] "%d pliki"
msgstr[2] "%d plików"

The order is important in these examples, but once you've set the rules your translators won't need to worry about them. Within the translation editor they will see the named forms from your locale settings, as follows:

img

Named forms

Some platforms reference these named forms directly in their file format. Below is the same example in Android's XML strings format:

<resources>
    <plurals name="fileCount">
        <item quantity="one">1 plik</item>
        <item quantity="few">%d pliki</item>
        <item quantity="other">%d plików</item>
    </plurals>
</resources>

In this case [0], [1] and [2] are more usefully described as "one", "few" and "other", so you don't need to worry so much about the order. Although this looks like a different system, it's exactly the same concept.

The named forms are not just easier to read, but they also abstract away the problem that "other" is the second form in English, but the third in Polish. Numeric offsets can make this confusing in some cases.

Loco supports numbered and named plural forms, and exports your files in the correct way for the target platform.

The plural formula

If your platform supports pluralization, it will have a method for selecting which translation to use for a given quantity. This is usually done by consulting a formula for a given locale. To illustrate the concept, English could be expressed as follows:

category = ( quantity == 1 ? "one" : "other" )

In the Gettext style, we evaluate the numeric offset (0 or 1):

( n == 1 ? 0 : 1 )

The above would yield the first form at [0] for only a quantity of n =1.

Most of Loco's built-in languages use formulas from the Unicode CLDR rules, but the precise rules for your project locales can be configured. Note that many platforms only support integer quantities, but the CLDR rules include fractions. This may explain some differences you find.

The CLDR states that Polish has four forms although the official Gettext example shows three. The latter has been widely adopted by many frameworks and libraries including WordPress and Symfony. Check your platform's documentation for the exact rule it uses.

Managing plurals in Loco

Loco takes a platform agnostic approach to pluralizing translations by treating each form as a separate translatable asset in its own right. These assets are then linked together according to the locale settings you've defined.

Often you'll be importing files into Loco that already define plurals. Depending on the format, pluralized translations will be identified and linked together automatically. You can check these bindings in the Management tab of your project, and modify them as needed:

img

The screenshot above shows three assets linked together as a set of plurals in the main project management list. This is only done once for each set of assets, and is not per-language. The set is a linear sequence [ 0, 1, 2 ], but each language knows its forms by the names in your settings.

Clicking the :plural icon: will open a dialogue where you can alter the order of linked assets and add new forms as needed:

img

The number of form fields you see here is dictated by the largest number required by your project. If you add Arabic to your project, you will see the maximum six fields.

For languages where a form is not used, its translations will be flagged as blank automatically. This means the asset will always be treated as "complete" when not needed.

Limitations of cardinal rules

The plural forms we've discussed so far are called "Cardinal" plurals. This is by the far the most common use of plurals in localisation frameworks, and most don't handle any other kind. Consider "Ordinal" plurals (e.g. "You came 1st in the race"). Most platforms can't perform rule selection for this and programmers have to code around the problem.

Consider also that a fixed set of rules for every translation in your application is very restrictive. Suppose you wanted a special case just for one number. (e.g."You have no messages"). English normally bundles zero into the "other" form, so making an exception would be another programmer workaround.

These problems don't just affect pluralization. Many languages have gendered grammar. Translating a phrase like "%s is happy" may require multiple translations, even if %s simply represents a person's name.

One solution to these problems is to use an embedded syntax in the translation text itself, such that it is independent of rules and encapsulates all its own logic. You are free to enter any formatting you like into your translations, but Loco can only treat custom code as a block of text.

ICU MessageFormat

ICU MessageFormat is a powerful syntax available on multiple platforms. There are official libraries for C and Java and it is well supported in PHP's Intl extension. Other libraries such as Format.js have native implementations, so it's a good choice of format if you have challenging localisation logic.

Loco's support for ICU MessageFormat extends to syntax highlighting, validation and a tool for initializing pluralized messages from your locale settings. As an alternative to linking multiple assets you can initialize a single ICU message by clicking the :plural icon: and selecting "Embed plurals with ICU syntax":

img

Here you can specify custom plural rules and Loco will generate a default message that looks like this:

{n,plural,
  =0 {no files}
  one {a file}
  other {# files}
}

Translators can use the same tool to initialize the message in their languages, but if they're not familiar with the syntax it's advisable to do this for them and flag the translation as "incomplete" for them to fill in.

You are free to change the default {n} placeholder and make any other additions to the syntax you wish. Plural selectors will still be detected, but be wary of complex nested messages. Loco will understand them, but you may confuse your translators. It's advisable to split complex messages into multiple assets when possible, as this will help translators navigate less code.

The MessageFormat syntax is so versatile that Loco doesn't currently insulate translators from it. Below is how this message will appear with "code view" enabled. Note that whitespace around the plural form syntax should not get printed by your application's formatter:

img

ICU support in Loco is new, so please let us know how you get on with it.

See also our Developer guide to string formatting.