Notice

This page show a previous version of the article

Adding a New Language to the Corpus (for Developers)

There are basically 4 steps to add a new language in Tatoeba:

  1. Add the language in LanguagesLib. This is needed to make the language available in the various dropdown lists, so that one can to add/translate sentences in the new language.
  2. Add the language icon. This is needed to display the language icon that is displayed next to every sentence.
  3. Update the languages table in the database. This is needed for the statistics page, that lists all the languages, and the number of sentences in each language. It's also needed to display the sentences count on some other pages.
  4. Update the sphinx.conf. This is needed so that users can search sentences in the new language.

1. Add the language in LanguagesLib

  • Open the file app/vendors/languages_lib.php

  • Add the language in the in languagesInTatoeba().

  • If the language has an ISO 639-1 code, also add it in get_Iso639_3_To_Iso639_1_Map().

  • If the language is a right to left language, also add it in getLanguageDirection($lang).

2. Add the icon

  • The icon needs to be a PNG file, named with the ISO 639-3 language code.

  • About the look of the icon: the dimension must be 30x20. On each icon there is (in theory) a 1px line of color #dcdcdc on the border bottom and right. Most of the icons also have gone through a luminosity change, so that they are a bit more pale than the original image. Anyway, most of it doesn't matter right now. The most important is that the icon is a PNG file of dimension 30x20.

  • It must be placed in app/webroot/img/flags.

3. Run the MySQL add_new_language() procedure

  • The script to create the add_new_language() procedure can be found in docs/database/procedures/add_new_language.sql. If your database does not have this procedure, then run this script first.

  • Connect to the database and execute the procedure: CALL add_new_language('<lang>', <listId>);. Replace <lang> with the ISO 639-3 code and <listId> with the id of the list that contains the sentences in that language.

4. Update the sphinx.conf

  • Generate the sphinx.conf file with the CakePHP shell: cake -app "/path/to/app" sphinx_conf > sphinx.conf
  • Replace the old sphinx.conf with the new one.
  • Re-index the sentences using the new conf: indexer --all --rotate.