Notice
This page show a previous version of the articleAdding a New Language to the Corpus (for Developers)
There are basically 4 steps to add a new language in Tatoeba:
- Add the language in LanguagesLib. This is needed to make the language available in the various dropdown lists, so that one can to add/translate sentences in the new language.
- Add the language icon. This is needed to display the language icon that is displayed next to every sentence.
- Update the languages table in the database. This is needed for the statistics page, that lists all the languages, and the number of sentences in each language. It's also needed to display the sentences count on some other pages.
- Update the sphinx.conf. This is needed so that users can search sentences in the new language.
1. Add the language in LanguagesLib
Open the file
app/vendors/languages_lib.php
Add the language in the in
languagesInTatoeba()
.If the language has an ISO 639-1 code, also add it in
get_Iso639_3_To_Iso639_1_Map()
.If the language is a right to left language, also add it in
getLanguageDirection($lang)
.
2. Add the icon
The icon needs to be a PNG file, named with the ISO 639-3 language code.
It must be placed in
app/webroot/img/flags
.
3. Run the MySQL add_new_language() procedure
The script to create the
add_new_language()
procedure can be found indocs/database/procedures/add_new_language.sql
. If your database does not have this procedure, then run this script first.Connect to the database and execute the procedure:
CALL add_new_language('<lang>', <listId>);
. Replace<lang>
with the ISO 639-3 code and<listId>
with the id of the list that contains the sentences in that language.
4. Update the sphinx.conf
- Generate the sphinx.conf file with the CakePHP shell:
cake -app "/path/to/app" sphinx_conf > sphinx.conf
- Replace the old sphinx.conf with the new one.
- Re-index the sentences using the new conf:
indexer --all --rotate
.