Version at: 23/02/2014, 22:30 vs. version at: 04/03/2014, 00:32
11# Adding a New Language to the Corpus (for Developers)
22
33##Introduction
44
55These are the instructions for adding a language to the Tatoeba corpus. Instructions for adding a language in which the Tatoeba UI will be displayed are found elsewhere.
66
77These instructions were copied from [Assembla](https://www.assembla.com/spaces/tatoeba2/wiki/Adding_a_language_in_Tatoeba). They have not yet been verified for current use, or updated.
88
99##FAQ for users
1010The FAQ for users who want to add a new language: [How to request a new language](http://tatoeba.org/eng/faq#new-language).
1111
1212##Language icon
13131. Create the icon for the language. The icon should be a PNG file of dimension 30x20. On each icon there is (in theory) a 1px line of color #dcdcdc on the border bottom and right. Most of the icons also have gone through a luminosity change, so that they are a bit more pale than the original image. Anyway, most of it doesn't matter right now. The most important is that the icon is a PNG file of dimension 30x20.
1414
15152. Commit the image to the repository. The icons are stored in the app/webroot/img/flags folder. Ask one of the people with repository access if you don't have it yourself and don't want to obtain it.
1616
17173. Update the app/webroot/img/flags directory on the server, to retrieve the new images for the new languages.
1818
1919##Source code
2020
21There is a script on the server that will modify the right files to add the new language code to the appropriate files. It can be executed as followed:
22
23./add_lang.sh <lang_code> <english_name> <list_id>
24
25For example:
26
27./add_lang.sh eng English 123
21There is a script that was used on the server to modify the right files to add the new language code to the appropriate files. As of 2014-03-02, an updated version of the [script](https://github.com/Tatoeba/tatoeba2/blob/master/docs/add_lang.sh) has been checked into our repository, though the script has not been uploaded to the server yet. The script takes the following parameters:
22- three-letter ISO 639-code (e.g., "epo" for Esperanto")
23- the English name of the language (e.g., "Esperanto")
24- the ID of a list containing at least 5 sentences in the given language (see the [list of lists](http://tatoeba.org/eng/sentences_lists/index))
25- the string "dev" (on a development machine) or "prod" (on the server)
26- the username for the database
27- the password for the database
28- the database name
2829
2930Once this script is executed, the new languages will be available on the website. The sentences that were in the list with id <list_id> will have their language set to the new language (instead of being set to language unknown). This script edits the following files:
3031
3132* **app/model/sentence.php**
3233Adds the language ISO code to the $validate array. Languages that are not part of this array are not allowed.
3334
3435* **app/views/helpers/languages.php**
3536Adds the language ISO code and the name to the languagesArray() method.
3637
37* **docs/generate_sphinx_conf.php**
38* **docs/generate\_sphinx\_conf.php**
3839Adds the language ISO code and name to the $languages array. Also adds the ISO code to the $cjkLanguages array if the language uses Chinese, Japanese or Korean characters.
3940
4041
4142In addition, make this change:
4243
4344* **app/webroot/img/flags/**
4445Add an icon for the new language. Dimensions 30 x 20. Format png. Modify luminosity so that it looks a bit more pale than the original and add a 1 pixel border on right and bottom (color #dcdcdc).
4546
4647
4748In the past, we used to edit this:
4849
49* **app/controllers/components/google_language_api.php**
50* **app/controllers/components/google\_language\_api.php**
5051Adds the corresponding case to the google2TatoebaCode() method, if Google supports the detection for the language. See the Language enum.
5152
5253but now tatodetect takes care of language detection.
5354
5455After you make your changes, commit your code to the repository, or have someone do it for you. See [Repositories](repositories).
5556
5657##In your local Tatoeba
5758
5859* Connect to mysql and select the database.
5960* If you haven't done it yet, run the following script:
60 docs/database/scripts/add_new_language.sql.
61 docs/database/scripts/add\_new\_language.sql.
6162It will create a procedure to easily add a new language and do the necessary updates in the database.
62* CALL add_new_language(iso_code, list_id, tag_name);
63* Read the comments in add_new_language.sql to have examples of the procedure.
63* CALL add_new_language(iso\_code, list\_id, tag_name);
64* Read the comments in add\_new\_language.sql to have examples of the procedure.
6465* Test that the language detection works (or can work) by adding a sentence with 'auto-detect'. There should be on Tatoeba a list of sentences in the language in question (named after the language in question).
6566* Test that you can change the language of a sentence into the language in question.
6667* Check that the count displays properly in the languages stats.
6768* If it's all fine, commit and refer to the ticket #225 in your comment (=> re #225) and indicate the languages that were added. Also refer to any separate tickets that were added to track adding these languages in particular. The syntax for referring to multiple tickets is described here.
6869
6970
7071##On the dev
7172
7273* Go to the 'dev' repository.
7374* Update it, if necessary.
7475* Connect to the mysql database of the dev version.
75* CALL add_new_language(iso_code, list_id, tag_name);
76* CALL add\_new\_language(iso_code, list_id, tag_name);
7677* Test the same things you have tested in local.
7778
7879
7980##On the prod
8081
8182* If everything is fine with the dev, go to the 'prod' repository.
8283* htop
8384* Check that the load is below 2.
8485* Update the repository if necessary
8586* Connect to the mysql database of the prod version.
86* CALL add_new_language(iso_code, list_id, tag_name);
87* CALL add\_new\_language(iso_code, list_id, tag_name);
8788* exit
8889* Check that the sentences that were in the list and tags have now the appropriate icon.
8990* Check that the language appears in the languages stats.
9091* cp /usr/local/etc/sphinx.conf /usr/local/etc/sphinx.conf.old
91* php generate_sphinx_conf.php > /usr/local/etc/sphinx.conf
92* php generate\_sphinx\_conf.php > /usr/local/etc/sphinx.conf
9293* Change the necessary things in the new config file (user, password, database and port). Look at the old conf file for reference.
9394* indexer --all --rotate & disown
9495
9596##Historical information only
9697We used to follow this procedure for adding a new language (example: French):
9798
9899* Create folder : /app/locale/fre/LC_MESSAGES
99100* Copy-paste default.pot into this folder
100101* Change it into default.po
101102* Open default.po with PoEdit (http://www.poedit.net/) and translate.
102103* Save. It will generate a *.mo file, which is used when replacing strings at runtime.
103104
104105and when new strings were added:
105106
106107* Follow the cake i18n instructions to generate the up-to-date POT file.
107108* Open the PO file (PO, not POT).
108109* In the menu : Catalog > Update from POT file…
109110* Choose the POT file that was newly generated
110111
111112The language of the page is set through the URL.
112113Example: http://localhost/tatoeba2/fre/sentences/index
113114
114115Resources:
115116
116117* http://blog.jaysalvat.com/articles/choix-des-langues-par-url-dans-cakephp.php
117118* http://www.formation-cakephp.com/41/multilingue-18n-l10n
118119
diff view generated by jsdifflib

Version at: 23/02/2014, 22:30

# Adding a New Language to the Corpus (for Developers)

##Introduction

These are the instructions for adding a language to the Tatoeba corpus. Instructions for adding a language in which the Tatoeba UI will be displayed are found elsewhere.

These instructions were copied from [Assembla](https://www.assembla.com/spaces/tatoeba2/wiki/Adding_a_language_in_Tatoeba). They have not yet been verified for current use, or updated. 

##FAQ for users
The FAQ for users who want to add a new language: [How to request a new language](http://tatoeba.org/eng/faq#new-language).

##Language icon
1. Create the icon for the language. The icon should be a PNG file of dimension 30x20. On each icon there is (in theory) a 1px line of color #dcdcdc on the border bottom and right. Most of the icons also have gone through a luminosity change, so that they are a bit more pale than the original image. Anyway, most of it doesn't matter right now. The most important is that the icon is a PNG file of dimension 30x20.

2. Commit the image to the repository. The icons are stored in the app/webroot/img/flags folder. Ask one of the people with repository access if you don't have it yourself and don't want to obtain it.

3. Update the app/webroot/img/flags directory on the server, to retrieve the new images for the new languages.

##Source code

There is a script on the server that will modify the right files to add the new language code to the appropriate files. It can be executed as followed:

./add_lang.sh <lang_code> <english_name> <list_id>

For example: 

./add_lang.sh eng English 123

Once this script is executed, the new languages will be available on the website. The sentences that were in the list with id <list_id> will have their language set to the new language (instead of being set to language unknown). This script edits the following files:

* **app/model/sentence.php**
Adds the language ISO code to the $validate array. Languages that are not part of this array are not allowed.

* **app/views/helpers/languages.php**
Adds the language ISO code and the name to the languagesArray() method.

* **docs/generate_sphinx_conf.php**
Adds the language ISO code and name to the $languages array. Also adds the ISO code to the $cjkLanguages array if the language uses Chinese, Japanese or Korean characters.

 
In addition, make this change:

* **app/webroot/img/flags/**
Add an icon for the new language. Dimensions 30 x 20. Format png. Modify luminosity so that it looks a bit more pale than the original and add a 1 pixel border on right and bottom (color #dcdcdc).


In the past, we used to edit this:

* **app/controllers/components/google_language_api.php**
Adds the corresponding case to the google2TatoebaCode() method, if Google supports the detection for the language. See the Language enum.

but now tatodetect takes care of language detection.

After you make your changes, commit your code to the repository, or have someone do it for you. See [Repositories](repositories).

##In your local Tatoeba

* Connect to mysql and select the database.
* If you haven't done it yet, run the following script: 
 docs/database/scripts/add_new_language.sql. 
It will create a procedure to easily add a new language and do the necessary updates in the database.
* CALL add_new_language(iso_code, list_id, tag_name);
* Read the comments in add_new_language.sql to have examples of the procedure.
* Test that the language detection works (or can work) by adding a sentence with 'auto-detect'. There should be on Tatoeba a list of sentences in the language in question (named after the language in question).
* Test that you can change the language of a sentence into the language in question.
* Check that the count displays properly in the languages stats.
* If it's all fine, commit and refer to the ticket #225 in your comment (=> re #225) and indicate the languages that were added. Also refer to any separate tickets that were added to track adding these languages in particular. The syntax for referring to multiple tickets is described here. 

 
##On the dev

* Go to the 'dev' repository.
* Update it, if necessary.
* Connect to the mysql database of the dev version.
* CALL add_new_language(iso_code, list_id, tag_name);
* Test the same things you have tested in local. 

 
##On the prod

* If everything is fine with the dev, go to the 'prod' repository.
* htop
* Check that the load is below 2.
* Update the repository if necessary
* Connect to the mysql database of the prod version.
* CALL add_new_language(iso_code, list_id, tag_name);
* exit
* Check that the sentences that were in the list and tags have now the appropriate icon.
* Check that the language appears in the languages stats.
* cp /usr/local/etc/sphinx.conf /usr/local/etc/sphinx.conf.old
* php generate_sphinx_conf.php > /usr/local/etc/sphinx.conf
* Change the necessary things in the new config file (user, password, database and port). Look at the old conf file for reference.
* indexer --all --rotate & disown

##Historical information only
We used to follow this procedure for adding a new language (example: French):

* Create folder : /app/locale/fre/LC_MESSAGES
* Copy-paste default.pot into this folder
* Change it into default.po
* Open default.po with PoEdit (http://www.poedit.net/) and translate.
* Save. It will generate a *.mo file, which is used when replacing strings at runtime.

and when new strings were added:

* Follow the cake i18n instructions to generate the up-to-date POT file.
* Open the PO file (PO, not POT).
* In the menu : Catalog > Update from POT file…
* Choose the POT file that was newly generated

The language of the page is set through the URL.
Example: http://localhost/tatoeba2/fre/sentences/index

Resources:

* http://blog.jaysalvat.com/articles/choix-des-langues-par-url-dans-cakephp.php
* http://www.formation-cakephp.com/41/multilingue-18n-l10n

version at: 04/03/2014, 00:32

# Adding a New Language to the Corpus (for Developers)

##Introduction

These are the instructions for adding a language to the Tatoeba corpus. Instructions for adding a language in which the Tatoeba UI will be displayed are found elsewhere.

These instructions were copied from [Assembla](https://www.assembla.com/spaces/tatoeba2/wiki/Adding_a_language_in_Tatoeba). They have not yet been verified for current use, or updated. 

##FAQ for users
The FAQ for users who want to add a new language: [How to request a new language](http://tatoeba.org/eng/faq#new-language).

##Language icon
1. Create the icon for the language. The icon should be a PNG file of dimension 30x20. On each icon there is (in theory) a 1px line of color #dcdcdc on the border bottom and right. Most of the icons also have gone through a luminosity change, so that they are a bit more pale than the original image. Anyway, most of it doesn't matter right now. The most important is that the icon is a PNG file of dimension 30x20.

2. Commit the image to the repository. The icons are stored in the app/webroot/img/flags folder. Ask one of the people with repository access if you don't have it yourself and don't want to obtain it.

3. Update the app/webroot/img/flags directory on the server, to retrieve the new images for the new languages.

##Source code

There is a script that was used on the server to modify the right files to add the new language code to the appropriate files. As of 2014-03-02, an updated version of the [script](https://github.com/Tatoeba/tatoeba2/blob/master/docs/add_lang.sh) has been checked into our repository, though the script has not been uploaded to the server yet. The script takes the following parameters:
- three-letter ISO 639-code (e.g., "epo" for Esperanto")
- the English name of the language (e.g., "Esperanto")
- the ID of a list containing at least 5 sentences in the given language (see the [list of lists](http://tatoeba.org/eng/sentences_lists/index))
- the string "dev" (on a development machine) or "prod" (on the server)
- the username for the database
- the password for the database
- the database name

Once this script is executed, the new languages will be available on the website. The sentences that were in the list with id <list_id> will have their language set to the new language (instead of being set to language unknown). This script edits the following files:

* **app/model/sentence.php**
Adds the language ISO code to the $validate array. Languages that are not part of this array are not allowed.

* **app/views/helpers/languages.php**
Adds the language ISO code and the name to the languagesArray() method.

* **docs/generate\_sphinx\_conf.php**
Adds the language ISO code and name to the $languages array. Also adds the ISO code to the $cjkLanguages array if the language uses Chinese, Japanese or Korean characters.

 
In addition, make this change:

* **app/webroot/img/flags/**
Add an icon for the new language. Dimensions 30 x 20. Format png. Modify luminosity so that it looks a bit more pale than the original and add a 1 pixel border on right and bottom (color #dcdcdc).


In the past, we used to edit this:

* **app/controllers/components/google\_language\_api.php**
Adds the corresponding case to the google2TatoebaCode() method, if Google supports the detection for the language. See the Language enum.

but now tatodetect takes care of language detection.

After you make your changes, commit your code to the repository, or have someone do it for you. See [Repositories](repositories).

##In your local Tatoeba

* Connect to mysql and select the database.
* If you haven't done it yet, run the following script: 
 docs/database/scripts/add\_new\_language.sql. 
It will create a procedure to easily add a new language and do the necessary updates in the database.
* CALL add_new_language(iso\_code, list\_id, tag_name);
* Read the comments in add\_new\_language.sql to have examples of the procedure.
* Test that the language detection works (or can work) by adding a sentence with 'auto-detect'. There should be on Tatoeba a list of sentences in the language in question (named after the language in question).
* Test that you can change the language of a sentence into the language in question.
* Check that the count displays properly in the languages stats.
* If it's all fine, commit and refer to the ticket #225 in your comment (=> re #225) and indicate the languages that were added. Also refer to any separate tickets that were added to track adding these languages in particular. The syntax for referring to multiple tickets is described here. 

 
##On the dev

* Go to the 'dev' repository.
* Update it, if necessary.
* Connect to the mysql database of the dev version.
* CALL add\_new\_language(iso_code, list_id, tag_name);
* Test the same things you have tested in local. 

 
##On the prod

* If everything is fine with the dev, go to the 'prod' repository.
* htop
* Check that the load is below 2.
* Update the repository if necessary
* Connect to the mysql database of the prod version.
* CALL add\_new\_language(iso_code, list_id, tag_name);
* exit
* Check that the sentences that were in the list and tags have now the appropriate icon.
* Check that the language appears in the languages stats.
* cp /usr/local/etc/sphinx.conf /usr/local/etc/sphinx.conf.old
* php generate\_sphinx\_conf.php > /usr/local/etc/sphinx.conf
* Change the necessary things in the new config file (user, password, database and port). Look at the old conf file for reference.
* indexer --all --rotate & disown

##Historical information only
We used to follow this procedure for adding a new language (example: French):

* Create folder : /app/locale/fre/LC_MESSAGES
* Copy-paste default.pot into this folder
* Change it into default.po
* Open default.po with PoEdit (http://www.poedit.net/) and translate.
* Save. It will generate a *.mo file, which is used when replacing strings at runtime.

and when new strings were added:

* Follow the cake i18n instructions to generate the up-to-date POT file.
* Open the PO file (PO, not POT).
* In the menu : Catalog > Update from POT file…
* Choose the POT file that was newly generated

The language of the page is set through the URL.
Example: http://localhost/tatoeba2/fre/sentences/index

Resources:

* http://blog.jaysalvat.com/articles/choix-des-langues-par-url-dans-cakephp.php
* http://www.formation-cakephp.com/41/multilingue-18n-l10n

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.