Version at: 11/11/2018, 15:46 vs. version at: 16/11/2019, 23:37
11# How to mass import sentences
2
3[Information on this page is outdated because this feature is currently disabled and to be rewritten at some point]
24
35Tatoeba has a feature to mass import sentences but it is restricted to admins. It is not designed to be user-friendly to all contributors, and can potentially be harmful in the wrong hands.
46
57The page to mass import sentences is: [https://tatoeba.org/eng/sentences/import](https://tatoeba.org/eng/sentences/import)
68
79There are two sections on this page:
810
911- Single sentences
1012- Sentences and translations
1113
1214## Single sentences
1315
1416This section allows you to import a list of sentences that are all in the **same language**. You cannot import a list of sentences in a mix of different languages.
1517
1618The format of the file is simple: each line is one sentence.
1719
1820Example: [https://gist.github.com/trang/50497ce80f494d2be801879ed3193129](https://gist.github.com/trang/50497ce80f494d2be801879ed3193129)
1921
2022
2123The form to import sentences has 3 fields:
2224
2325- **Language of the sentences**: a dropdown for you to choose the language of the sentences
2426- **File**: a button which will open a popup for you to select the file to import
2527- **Numeric user id**: a text input for you to enter the ID of the user
2628
2729## Sentences and translations
2830
2931This section allows you to import sentences and their translations. Again, it is not possible to mix languages.
3032
3133The format of the file is `sentence [tab] translation`. The separator between the sentence and translation must be a tab. It cannot be a space or sequence of spaces.
3234
3335Example: [https://gist.github.com/trang/149ac3faddee6f3bdfde5ae97a303816](https://gist.github.com/trang/149ac3faddee6f3bdfde5ae97a303816)
3436
3537The form has 4 fields:
3638
3739- **Language of the sentences**: a dropdown to choose the language of the sentences
3840- **Language of the translations**: a dropdown to choose the language of the translations
3941- **File**: a button which will open a popup to select the file to import
4042- **Numeric user id**: a text input to enter the ID of the user
4143
4244## How to find the user ID
4345
4446* Go to the [List of all members](https://tatoeba.org/eng/users/all).
4547* Search for the user in the sidebar.
4648* This will lead you to the "latest activity" page for the user.
4749* This page can also be accessed from the user profile by clicking on the "show latest activity" link in the "Stats" section in the sidebar.
4850* Check the URL of the page. It will look like `https://tatoeba.org/eng/users/show/{number}`. The `{number}` is the user ID.
4951
5052## Encoding requirement
5153
5254The file that you upload must be encoded in **UTF-8**. If you happen to import a file that is not encoded in UTF-8 and the sentences contain special characters, the text imported will be messed up.
5355
5456You can [check the encoding of your file online](https://nlp.fi.muni.cz/projects/chared/). Your file is encoded in UTF-8 if the result says `Detected by Chared: utf_8`.
5557
5658There are also ways you can check offline, depending on which OS you are using. Best is to Google it (for instance: [detect file encoding](http://google.com/search?q=detect+file+encoding)).
5759
5860## Warnings
5961
6062**(1)** There is no safety net with this feature. Once you click "import" all the sentences will be processed and there is no easy way to cancel or revert.
6163
6264 * Make sure the encoding is UTF-8.
6365 * Make sure you entered the correct user ID.
6466 * Make sure you selected the correct language(s).
6567 * Don't import files from just anyone who asks you. Only import from trusted contributors.
6668 * To avoid importing a ton of bad quality sentences, you should import only sentences that are in the native/strongest language of the user.
6769 * Do not import the sentences if you suspect there may be copyright/license issues. Ask the user if they wrote all the sentences themselves.
6870 * Luckily you don't have to worry anymore about duplicates because Tatoeba handles that. If you accidentally import a file that you have already imported, it should have no effect.
6971 * Do a test on the [dev website](https://dev.tatoeba.org) first, and only upload your file on the main website if it went well on the dev website.
7072
7173**(2)** The mass import feature can slow down the website.
7274
7375 * It's a good idea to limit the number of sentences imported per operation to 400 or fewer. Larger imports will work sometimes, but it's better for others using the website at the same time if you don't import too many at once.
7476 * It's also a good idea to import during the weekend rather than during weekdays, as there is usually less traffic. Sunday and Saturday between 00:00 and 09:00 (GMT+1) are when Tatoeba has the least traffic.
7577
7678**(3)** The logs of the sentences will indicate *you* (the admin) as the person who created the sentences.
7779
7880 * If you are importing sentences for another user, make sure they are aware of this and don't care about it.
7981 * The user will still be assigned as the owner, but the logs will state that you are the contributor, and the "Contributions" stats will increase on *your profile* and not on the owner's profile.
8082
8183**(4)** When sentences and translations are imported, the interface does not record which sentence is actually the original and which one is the translation. Both sentences and translations will be considered as original.
8284
8385 * The mass import feature functions as if someone were adding the sentence, then adding the translation separately as a new sentence (instead of clicking on the "Translate" button), and then using the linking feature to connect the sentence and the translation.
8486 * As a result, It doesn't matter whether the file you are importing has the translations in the first column and the original sentences in the second column, or vice versa.
8587 * However, this also means that the mass import feature does not reflect reality accurately. It loses information that is not completely critical, but could be useful in some cases (for instance for handling the permission to change the license of a sentence).
8688
8789
diff view generated by jsdifflib

Version at: 11/11/2018, 15:46

# How to mass import sentences

Tatoeba has a feature to mass import sentences but it is restricted to admins. It is not designed to be user-friendly to all contributors, and can potentially be harmful in the wrong hands.

The page to mass import sentences is: [https://tatoeba.org/eng/sentences/import](https://tatoeba.org/eng/sentences/import)

There are two sections on this page:

- Single sentences
- Sentences and translations

## Single sentences

This section allows you to import a list of sentences that are all in the **same language**. You cannot import a list of sentences in a mix of different languages.

The format of the file is simple: each line is one sentence. 

Example: [https://gist.github.com/trang/50497ce80f494d2be801879ed3193129](https://gist.github.com/trang/50497ce80f494d2be801879ed3193129)


The form to import sentences has 3 fields:

- **Language of the sentences**: a dropdown for you to choose the language of the sentences
- **File**: a button which will open a popup for you to select the file to import
- **Numeric user id**: a text input for you to enter the ID of the user

## Sentences and translations

This section allows you to import sentences and their translations. Again, it is not possible to mix languages.

The format of the file is `sentence [tab] translation`. The separator between the sentence and translation must be a tab. It cannot be a space or sequence of spaces.

Example: [https://gist.github.com/trang/149ac3faddee6f3bdfde5ae97a303816](https://gist.github.com/trang/149ac3faddee6f3bdfde5ae97a303816)

The form has 4 fields:

- **Language of the sentences**: a dropdown to choose the language of the sentences
- **Language of the translations**: a dropdown to choose the language of the translations
- **File**: a button which will open a popup to select the file to import
- **Numeric user id**: a text input to enter the ID of the user

## How to find the user ID

* Go to the [List of all members](https://tatoeba.org/eng/users/all).
* Search for the user in the sidebar.
* This will lead you to the "latest activity" page for the user.
* This page can also be accessed from the user profile by clicking on the "show latest activity" link in the "Stats" section in the sidebar.
* Check the URL of the page. It will look like `https://tatoeba.org/eng/users/show/{number}`. The `{number}` is the user ID.

## Encoding requirement

The file that you upload must be encoded in **UTF-8**. If you happen to import a file that is not encoded in UTF-8 and the sentences contain special characters, the text imported will be messed up.

You can [check the encoding of your file online](https://nlp.fi.muni.cz/projects/chared/). Your file is encoded in UTF-8 if the result says `Detected by Chared: utf_8`.

There are also ways you can check offline, depending on which OS you are using. Best is to Google it (for instance: [detect file encoding](http://google.com/search?q=detect+file+encoding)).

## Warnings

**(1)** There is no safety net with this feature. Once you click "import" all the sentences will be processed and there is no easy way to cancel or revert.

  * Make sure the encoding is UTF-8.
  * Make sure you entered the correct user ID.
  * Make sure you selected the correct language(s).
  * Don't import files from just anyone who asks you. Only import from trusted contributors. 
     * To avoid importing a ton of bad quality sentences, you should import only sentences that are in the native/strongest language of the user.
     * Do not import the sentences if you suspect there may be copyright/license issues. Ask the user if they wrote all the sentences themselves.
  * Luckily you don't have to worry anymore about duplicates because Tatoeba handles that. If you accidentally import a file that you have already imported, it should have no effect.
  * Do a test on the [dev website](https://dev.tatoeba.org) first, and only upload your file on the main website if it went well on the dev website.

**(2)** The mass import feature can slow down the website.

  * It's a good idea to limit the number of sentences imported per operation to 400 or fewer. Larger imports will work sometimes, but it's better for others using the website at the same time if you don't import too many at once.
  * It's also a good idea to import during the weekend rather than during weekdays, as there is usually less traffic. Sunday and Saturday between 00:00 and 09:00 (GMT+1) are when Tatoeba has the least traffic.

**(3)** The logs of the sentences will indicate *you* (the admin) as the person who created the sentences.

  * If you are importing sentences for another user, make sure they are aware of this and don't care about it. 
  * The user will still be assigned as the owner, but the logs will state that you are the contributor, and the "Contributions" stats will increase on *your profile* and not on the owner's profile.

**(4)** When sentences and translations are imported, the interface does not record which sentence is actually the original and which one is the translation. Both sentences and translations will be considered as original. 

  * The mass import feature functions as if someone were adding the sentence, then adding the translation separately as a new sentence (instead of clicking on the "Translate" button), and then using the linking feature to connect the sentence and the translation.
  * As a result, It doesn't matter whether the file you are importing has the translations in the first column and the original sentences in the second column, or vice versa.
  * However, this also means that the mass import feature does not reflect reality accurately. It loses information that is not completely critical, but could be useful in some cases (for instance for handling the permission to change the license of a sentence).

version at: 16/11/2019, 23:37

# How to mass import sentences

[Information on this page is outdated because this feature is currently disabled and to be rewritten at some point]

Tatoeba has a feature to mass import sentences but it is restricted to admins. It is not designed to be user-friendly to all contributors, and can potentially be harmful in the wrong hands.

The page to mass import sentences is: [https://tatoeba.org/eng/sentences/import](https://tatoeba.org/eng/sentences/import)

There are two sections on this page:

- Single sentences
- Sentences and translations

## Single sentences

This section allows you to import a list of sentences that are all in the **same language**. You cannot import a list of sentences in a mix of different languages.

The format of the file is simple: each line is one sentence. 

Example: [https://gist.github.com/trang/50497ce80f494d2be801879ed3193129](https://gist.github.com/trang/50497ce80f494d2be801879ed3193129)


The form to import sentences has 3 fields:

- **Language of the sentences**: a dropdown for you to choose the language of the sentences
- **File**: a button which will open a popup for you to select the file to import
- **Numeric user id**: a text input for you to enter the ID of the user

## Sentences and translations

This section allows you to import sentences and their translations. Again, it is not possible to mix languages.

The format of the file is `sentence [tab] translation`. The separator between the sentence and translation must be a tab. It cannot be a space or sequence of spaces.

Example: [https://gist.github.com/trang/149ac3faddee6f3bdfde5ae97a303816](https://gist.github.com/trang/149ac3faddee6f3bdfde5ae97a303816)

The form has 4 fields:

- **Language of the sentences**: a dropdown to choose the language of the sentences
- **Language of the translations**: a dropdown to choose the language of the translations
- **File**: a button which will open a popup to select the file to import
- **Numeric user id**: a text input to enter the ID of the user

## How to find the user ID

* Go to the [List of all members](https://tatoeba.org/eng/users/all).
* Search for the user in the sidebar.
* This will lead you to the "latest activity" page for the user.
* This page can also be accessed from the user profile by clicking on the "show latest activity" link in the "Stats" section in the sidebar.
* Check the URL of the page. It will look like `https://tatoeba.org/eng/users/show/{number}`. The `{number}` is the user ID.

## Encoding requirement

The file that you upload must be encoded in **UTF-8**. If you happen to import a file that is not encoded in UTF-8 and the sentences contain special characters, the text imported will be messed up.

You can [check the encoding of your file online](https://nlp.fi.muni.cz/projects/chared/). Your file is encoded in UTF-8 if the result says `Detected by Chared: utf_8`.

There are also ways you can check offline, depending on which OS you are using. Best is to Google it (for instance: [detect file encoding](http://google.com/search?q=detect+file+encoding)).

## Warnings

**(1)** There is no safety net with this feature. Once you click "import" all the sentences will be processed and there is no easy way to cancel or revert.

  * Make sure the encoding is UTF-8.
  * Make sure you entered the correct user ID.
  * Make sure you selected the correct language(s).
  * Don't import files from just anyone who asks you. Only import from trusted contributors. 
     * To avoid importing a ton of bad quality sentences, you should import only sentences that are in the native/strongest language of the user.
     * Do not import the sentences if you suspect there may be copyright/license issues. Ask the user if they wrote all the sentences themselves.
  * Luckily you don't have to worry anymore about duplicates because Tatoeba handles that. If you accidentally import a file that you have already imported, it should have no effect.
  * Do a test on the [dev website](https://dev.tatoeba.org) first, and only upload your file on the main website if it went well on the dev website.

**(2)** The mass import feature can slow down the website.

  * It's a good idea to limit the number of sentences imported per operation to 400 or fewer. Larger imports will work sometimes, but it's better for others using the website at the same time if you don't import too many at once.
  * It's also a good idea to import during the weekend rather than during weekdays, as there is usually less traffic. Sunday and Saturday between 00:00 and 09:00 (GMT+1) are when Tatoeba has the least traffic.

**(3)** The logs of the sentences will indicate *you* (the admin) as the person who created the sentences.

  * If you are importing sentences for another user, make sure they are aware of this and don't care about it. 
  * The user will still be assigned as the owner, but the logs will state that you are the contributor, and the "Contributions" stats will increase on *your profile* and not on the owner's profile.

**(4)** When sentences and translations are imported, the interface does not record which sentence is actually the original and which one is the translation. Both sentences and translations will be considered as original. 

  * The mass import feature functions as if someone were adding the sentence, then adding the translation separately as a new sentence (instead of clicking on the "Translate" button), and then using the linking feature to connect the sentence and the translation.
  * As a result, It doesn't matter whether the file you are importing has the translations in the first column and the original sentences in the second column, or vice versa.
  * However, this also means that the mass import feature does not reflect reality accurately. It loses information that is not completely critical, but could be useful in some cases (for instance for handling the permission to change the license of a sentence).

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.