Version at: 10/11/2018, 01:49 vs. version at: 18/11/2019, 00:30
11# Using the Tatoeba Corpus for Your Own Projects
22
33## Terms of Use
44
55* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
6* Note that the Terms of Use for the audio files is not the same as for using the text of sentences. See the [Lists of Who Recorded Which Audio](http://aitech.ac.jp/~iteslj/a4esl/temporary/tatoeba/lists/audio.html) for lists of sentences and what license, if any, these people have offered their files for use outside of tatoeba.org.
6* Note that the Terms of Use for the audio files is not the same as for using the text of sentences. See the [the list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) for lists of sentences and what license, if any, these people have offered their files for use outside of tatoeba.org.
77
88## Warning: The Tatoeba Corpus is not error-free.
99
1010* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
1111* Be aware of the following.
1212 * Though we recommend native-speaker contributions, a number of non-native speakers have contributed in languages they are learning.
1313 * We ask our members not to change archaic language to something that currently sounds natural.
1414 * Translations may not always be accurate, even though the linked sentences are correct sentences.
1515
1616## Suggestions for Those Planning to Use the Corpus
1717
1818* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
1919* You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data. *(See [Tatoeba.org Native Speakers](http://bit.ly/nativespeakers) maintained by CK)*
2020* You should inform your audience that the data may contain errors *(See [an example](http://tatoeba.org/eng/sentences/show/2535464))* and explain what steps you have taken to help minimize the errors.
2121* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
2222* You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language. This helps make sure that what you are teaching people isn't a mistake.
2323*(See a live example on the right side of this page:: [http://www.manythings.org/bilingual/](http://www.manythings.org/bilingual/)).
2424
2525## Download the Tatoeba Corpus
2626
2727* [Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.
2828
diff view generated by jsdifflib

Version at: 10/11/2018, 01:49

# Using the Tatoeba Corpus for Your Own Projects

## Terms of Use

* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
* Note that the Terms of Use for the audio files is not the same as for using the text of sentences.  See the [Lists of Who Recorded Which Audio](http://aitech.ac.jp/~iteslj/a4esl/temporary/tatoeba/lists/audio.html) for lists of sentences and what license, if any, these people have offered their files for use outside of tatoeba.org.

## Warning: The Tatoeba Corpus is not error-free.

* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
* Be aware of the following.
   * Though we recommend native-speaker contributions, a number of non-native speakers have contributed in languages they are learning.
   * We ask our members not to change archaic language to something that currently sounds natural.
 * Translations may not always be accurate, even though the linked sentences are correct sentences.

## Suggestions for Those Planning to Use the Corpus

* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
* You may want to eliminate all sentences not "owned" by native speakers.  However, even this will not guarantee perfect data.  *(See [Tatoeba.org Native Speakers](http://bit.ly/nativespeakers) maintained by CK)*
* You should inform your audience that the data may contain errors *(See [an example](http://tatoeba.org/eng/sentences/show/2535464))* and explain what steps you have taken to help minimize the errors.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
* You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language. This helps make sure that what you are teaching people isn't a mistake.
*(See a live example on the right side of this page:: [http://www.manythings.org/bilingual/](http://www.manythings.org/bilingual/)).

## Download the Tatoeba Corpus

* [Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.

version at: 18/11/2019, 00:30

# Using the Tatoeba Corpus for Your Own Projects

## Terms of Use

* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
* Note that the Terms of Use for the audio files is not the same as for using the text of sentences.  See the [the list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) for lists of sentences and what license, if any, these people have offered their files for use outside of tatoeba.org.

## Warning: The Tatoeba Corpus is not error-free.

* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
* Be aware of the following.
   * Though we recommend native-speaker contributions, a number of non-native speakers have contributed in languages they are learning.
   * We ask our members not to change archaic language to something that currently sounds natural.
 * Translations may not always be accurate, even though the linked sentences are correct sentences.

## Suggestions for Those Planning to Use the Corpus

* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
* You may want to eliminate all sentences not "owned" by native speakers.  However, even this will not guarantee perfect data.  *(See [Tatoeba.org Native Speakers](http://bit.ly/nativespeakers) maintained by CK)*
* You should inform your audience that the data may contain errors *(See [an example](http://tatoeba.org/eng/sentences/show/2535464))* and explain what steps you have taken to help minimize the errors.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
* You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language. This helps make sure that what you are teaching people isn't a mistake.
*(See a live example on the right side of this page:: [http://www.manythings.org/bilingual/](http://www.manythings.org/bilingual/)).

## Download the Tatoeba Corpus

* [Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.