Version at: 22/02/2014, 19:20 vs. version at: 22/02/2014, 19:30
11# Adding a New Language to the Corpus (for Developers)
2
3##Introduction
24
35These are the instructions for adding a language to the Tatoeba corpus. Instructions for adding a language in which the Tatoeba UI will be displayed are found elsewhere.
46
57These instructions were copied from [Assembla](https://www.assembla.com/spaces/tatoeba2/wiki/Adding_a_language_in_Tatoeba). They have not yet been verified for current use, or updated.
68
9##Instructions
710The FAQ for users who want to add a new language: http://tatoeba.org/eng/faq#new-language
81. In the source code
11
12###In the source code
913
1014There is a script on the server, add_lang.sh, that will modifiy the right files to add the new language code to the appropriate files. The script is called add_lang.sh. It can be executed as followed:
1115
1216./add_lang.sh <lang_code> <english_name> <list_id>
1317
1418For example:
1519
1620./add_lang.sh eng English 123
1721
1822Once this script is executed, the new languages will be available on the website. The sentences that were in the list with id <list_id> will have their language set to the new language (instead of being set to language unknown). This script edits the following files:
1923
20app/model/sentence.php
24* app/model/sentence.php
2125Adds the language ISO code to the $validate array. Languages that are not part of this array are not allowed.
22app/views/helpers/languages.php
2326
27* app/views/helpers/languages.php
2428Adds the language ISO code and the name to the languagesArray() method.
2529
26docs/generate_sphinx_conf.php
30* docs/generate_sphinx_conf.php
2731Adds the language ISO code and name to the $languages array. Also adds the ISO code to the $cjkLanguages array if the language uses Chinese, Japanese or Korean characters.
2832
2933
30
3134In addition, make this change:
3235
33app/webroot/img/flags/
36* app/webroot/img/flags/
3437Add an icon for the new language. Dimensions 30 x 20. Format png. Modify luminosity so that it looks a bit more pale than the original and add a 1 pixel border on right and bottom (color #dcdcdc).
3538
3639
3740In the past, we used to edit this:
3841
39app/controllers/components/google_language_api.php
42* app/controllers/components/google_language_api.php
4043Adds the corresponding case to the google2TatoebaCode() method, if Google supports the detection for the language. See the Language enum.
41
42
4344
4445but now tatodetect takes care of language detection.
4546
4647
4748
4849After you make your changes, commit your code to the SVN repository.
492. In your local Tatoeba
50
51##In your local Tatoeba
5052
5153 Connect to mysql and select the database.
5254 If you haven't done it yet, run the following script: docs/database/scripts/add_new_language.sql. It will create a procedure to easily add a new language and do the necessary updates in the database.
5355 CALL add_new_language(iso_code, list_id, tag_name);
5456 Read the comments in add_new_language.sql to have examples of the procedure.
5557 Test that the language detection works (or can work) by adding a sentence with 'auto-detect'. There should be on Tatoeba a list of sentences in the language in question (named after the language in question).
5658 Test that you can change the language of a sentence into the language in question.
5759 Check that the count displays properly in the languages stats.
5860 If it's all fine, commit and refer to the ticket #225 in your comment (=> re #225) and indicate the languages that were added. Also refer to any separate tickets that were added to track adding these languages in particular. The syntax for referring to multiple tickets is described here.
5961
6062
613. On the dev
63##On the dev
6264
63 Go to the 'dev' repertory.
65 Go to the 'dev' repository.
6466 svn up
6567 Connect to the mysql database of the dev version.
6668 CALL add_new_language(iso_code, list_id, tag_name);
6769 Test the same things you have tested in local.
6870
6971
704. On the prod
72##On the prod
7173
72 If everything is fine with the dev, go the the 'prod' repertory.
74 If everything is fine with the dev, go to the 'prod' repository.
7375 htop
7476 Check that the load is below 2.
7577 svn up
7678 Connect to the mysql database of the prod version.
7779 CALL add_new_language(iso_code, list_id, tag_name);
7880 exit
7981 Check that the sentences that were in the list and tags have now the appropriate icon.
8082 Check that the language appears in the languages stats.
8183 cp /usr/local/etc/sphinx.conf /usr/local/etc/sphinx.conf.old
8284 php generate_sphinx_conf.php > /usr/local/etc/sphinx.conf
8385 Change the necessary things in the new config file (user, password, database and port). Look at the old conf file for reference.
8486 indexer --all --rotate & disown
8587
8688
diff view generated by jsdifflib

Version at: 22/02/2014, 19:20

# Adding a New Language to the Corpus (for Developers)

These are the instructions for adding a language to the Tatoeba corpus. Instructions for adding a language in which the Tatoeba UI will be displayed are found elsewhere.

These instructions were copied from [Assembla](https://www.assembla.com/spaces/tatoeba2/wiki/Adding_a_language_in_Tatoeba). They have not yet been verified for current use, or updated. 

The FAQ for users who want to add a new language: http://tatoeba.org/eng/faq#new-language
1. In the source code

There is a script on the server, add_lang.sh, that will modifiy the right files to add the new language code to the appropriate files. The script is called add_lang.sh. It can be executed as followed:

./add_lang.sh <lang_code> <english_name> <list_id>

For example: 

./add_lang.sh eng English 123

Once this script is executed, the new languages will be available on the website. The sentences that were in the list with id <list_id> will have their language set to the new language (instead of being set to language unknown). This script edits the following files:

app/model/sentence.php
Adds the language ISO code to the $validate array. Languages that are not part of this array are not allowed.
app/views/helpers/languages.php

Adds the language ISO code and the name to the languagesArray() method.

docs/generate_sphinx_conf.php
Adds the language ISO code and name to the $languages array. Also adds the ISO code to the $cjkLanguages array if the language uses Chinese, Japanese or Korean characters.

 

In addition, make this change:

app/webroot/img/flags/
Add an icon for the new language. Dimensions 30 x 20. Format png. Modify luminosity so that it looks a bit more pale than the original and add a 1 pixel border on right and bottom (color #dcdcdc).


In the past, we used to edit this:

app/controllers/components/google_language_api.php
Adds the corresponding case to the google2TatoebaCode() method, if Google supports the detection for the language. See the Language enum.

 

but now tatodetect takes care of language detection.

 

After you make your changes, commit your code to the SVN repository.
2. In your local Tatoeba

    Connect to mysql and select the database.
    If you haven't done it yet, run the following script: docs/database/scripts/add_new_language.sql. It will create a procedure to easily add a new language and do the necessary updates in the database.
    CALL add_new_language(iso_code, list_id, tag_name);
    Read the comments in add_new_language.sql to have examples of the procedure.
    Test that the language detection works (or can work) by adding a sentence with 'auto-detect'. There should be on Tatoeba a list of sentences in the language in question (named after the language in question).
    Test that you can change the language of a sentence into the language in question.
    Check that the count displays properly in the languages stats.
    If it's all fine, commit and refer to the ticket #225 in your comment (=> re #225) and indicate the languages that were added. Also refer to any separate tickets that were added to track adding these languages in particular. The syntax for referring to multiple tickets is described here. 

 
3. On the dev

    Go to the 'dev' repertory.
    svn up
    Connect to the mysql database of the dev version.
    CALL add_new_language(iso_code, list_id, tag_name);
    Test the same things you have tested in local. 

 
4. On the prod

    If everything is fine with the dev, go the the 'prod' repertory.
    htop
    Check that the load is below 2.
    svn up
    Connect to the mysql database of the prod version.
    CALL add_new_language(iso_code, list_id, tag_name);
    exit
    Check that the sentences that were in the list and tags have now the appropriate icon.
    Check that the language appears in the languages stats.
    cp /usr/local/etc/sphinx.conf /usr/local/etc/sphinx.conf.old
    php generate_sphinx_conf.php > /usr/local/etc/sphinx.conf
    Change the necessary things in the new config file (user, password, database and port). Look at the old conf file for reference.
    indexer --all --rotate & disown

version at: 22/02/2014, 19:30

# Adding a New Language to the Corpus (for Developers)

##Introduction

These are the instructions for adding a language to the Tatoeba corpus. Instructions for adding a language in which the Tatoeba UI will be displayed are found elsewhere.

These instructions were copied from [Assembla](https://www.assembla.com/spaces/tatoeba2/wiki/Adding_a_language_in_Tatoeba). They have not yet been verified for current use, or updated. 

##Instructions
The FAQ for users who want to add a new language: http://tatoeba.org/eng/faq#new-language

###In the source code

There is a script on the server, add_lang.sh, that will modifiy the right files to add the new language code to the appropriate files. The script is called add_lang.sh. It can be executed as followed:

./add_lang.sh <lang_code> <english_name> <list_id>

For example: 

./add_lang.sh eng English 123

Once this script is executed, the new languages will be available on the website. The sentences that were in the list with id <list_id> will have their language set to the new language (instead of being set to language unknown). This script edits the following files:

* app/model/sentence.php
Adds the language ISO code to the $validate array. Languages that are not part of this array are not allowed.

* app/views/helpers/languages.php
Adds the language ISO code and the name to the languagesArray() method.

* docs/generate_sphinx_conf.php
Adds the language ISO code and name to the $languages array. Also adds the ISO code to the $cjkLanguages array if the language uses Chinese, Japanese or Korean characters.

 
In addition, make this change:

* app/webroot/img/flags/
Add an icon for the new language. Dimensions 30 x 20. Format png. Modify luminosity so that it looks a bit more pale than the original and add a 1 pixel border on right and bottom (color #dcdcdc).


In the past, we used to edit this:

* app/controllers/components/google_language_api.php
Adds the corresponding case to the google2TatoebaCode() method, if Google supports the detection for the language. See the Language enum.

but now tatodetect takes care of language detection.

 

After you make your changes, commit your code to the SVN repository.

##In your local Tatoeba

    Connect to mysql and select the database.
    If you haven't done it yet, run the following script: docs/database/scripts/add_new_language.sql. It will create a procedure to easily add a new language and do the necessary updates in the database.
    CALL add_new_language(iso_code, list_id, tag_name);
    Read the comments in add_new_language.sql to have examples of the procedure.
    Test that the language detection works (or can work) by adding a sentence with 'auto-detect'. There should be on Tatoeba a list of sentences in the language in question (named after the language in question).
    Test that you can change the language of a sentence into the language in question.
    Check that the count displays properly in the languages stats.
    If it's all fine, commit and refer to the ticket #225 in your comment (=> re #225) and indicate the languages that were added. Also refer to any separate tickets that were added to track adding these languages in particular. The syntax for referring to multiple tickets is described here. 

 
##On the dev

    Go to the 'dev' repository.
    svn up
    Connect to the mysql database of the dev version.
    CALL add_new_language(iso_code, list_id, tag_name);
    Test the same things you have tested in local. 

 
##On the prod

    If everything is fine with the dev, go to the 'prod' repository.
    htop
    Check that the load is below 2.
    svn up
    Connect to the mysql database of the prod version.
    CALL add_new_language(iso_code, list_id, tag_name);
    exit
    Check that the sentences that were in the list and tags have now the appropriate icon.
    Check that the language appears in the languages stats.
    cp /usr/local/etc/sphinx.conf /usr/local/etc/sphinx.conf.old
    php generate_sphinx_conf.php > /usr/local/etc/sphinx.conf
    Change the necessary things in the new config file (user, password, database and port). Look at the old conf file for reference.
    indexer --all --rotate & disown

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.