Abstract:
Spelling messages like "1 file found" or "5 files found" correctly in any language.

Created 1 year ago by Peter Kankowski
Last changed 5 months ago
Filed under Algorithms

Plural forms

Introduction

Messages like "%d file(s) found" are notoriously hard to localize. In English language, there are only 2 forms: 1 file (singular) and 2 or more files (plural), but other languages use up to 4 plural forms. For example, there are 3 forms in Polish:

    0 plików
    1 plik
  2-4 pliki
 5-21 plików
22-24 pliki
25-31 plików
      etc.

Other languages (French, Russian, Czech, etc.) also use rules different from English and from each other.

The gettext library extracts a rule for plural form selection from the localization file. The rule is a C language expression, which is evaluated for each message. It's a universal solution, but an expression evaluator is probably an overkill for this task.

Simpler solution

Here are some observations about the languages mentioned on gettext page:

So, the rule for each plural form will consist of these components:

range_start  range_end  modulo_for_repetition  skip_teens_flag

Here are some examples:

English
singular:  range_start = 1, range_end = 1
plural:    all other numbers

Polish
singular:  range_start = 1, range_end = 1
plural1:   range_start = 2, range_end = 4, modulo = 10, skip_teens = true
plural2:   all other numbers

Irish
singular:  range_start = 1, range_end = 1
plural1:   range_start = 2, range_end = 2
plural2:   all other numbers

Lithuanian
singular:  range_start = 1, range_end = 1, modulo = 10, skip_teens = true
plural1:   range_start = 2, range_end = 9, modulo = 10, skip_teens = true
plural2:   all other numbers (from 10 to 19)

The rules for each language could be written to a short string, which is stored in the language file (e.g., for Lithuanian, the string is "1 1 10 t; 2 9 10 t").

Using the Code

Include plurals.h and plurals.c in your project. The interface consists of two functions. First, you call PluralsReadCfg to read rules from the string. Next, you pass a number to PluralsGetForm. It returns the index of correct plural form for this number, which you use to read the string from your language file:

PLURAL_INFO plurals;
PluralsReadCfg(&plurals, ReadFromLngFile("PluralRules"));

char lng_str_name[16], message[128];
sprintf(lng_str_name, "FilesFound%d", PluralsGetForm(&plurals, number));
sprintf(message, ReadFromLngFile(lng_str_name), number);

In the language file, you have strings for each plural form:

PluralRules = "1"
FilesFound0 = "%d file found"
FilesFound1 = "%d files found"

ReadFromLngFile is your own function. You could wrap two sprintfs in a higher-level function (and, of course, use a secure function instead of sprintf to protect your program from buffer overflow).

Even better solution is implementing a custom formatting function, so you could write something like "%d %(file|files) found" in the language file. Scott Rippey devised this technique and implemented it in VB .NET.

Conclusion

Two functions, PluralsReadCfg and PluralsGetForm, take 500 bytes in your executable file when compiled with MSVC++. A small price to pay for spelling your messages correctly in any language.

Download the source code (25 KB, MSVC++)

Leave your comment

Your name:
Comment: