Version at: 17/05/2020, 23:57 vs. version at: 28/05/2020, 11:47
11# Using the Tatoeba Corpus for Your Own Projects
22
33## Terms of Use
44
5Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use). Note that the terms of use for the audio files are not the same as for sentence text. See the [list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) to see the license, if any, under which these people have offered their files for use outside of tatoeba.org. You should verify these licenses by clicking "audio files" on each member's profile.
5Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
6
7Note that the terms of use for the audio files are not the same as for sentence text. See the [list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) to see the license, if any, under which these people have offered their files for use outside of tatoeba.org. You should verify these licenses by clicking "audio files" on each member's profile.
68
79## Processing the Tatoeba Corpus
810
911You will probably want to filter out sentences that:
1012
1113* require correction or improvement
1214* sound unnatural
1315* are poor or unnatural translations of other sentences
1416
1517You may also may want to filter out those that:
1618
1719* contain vulgar language or sexual references
1820* contain archaic or old-fashioned content
1921* are particularly long
2022
2123You can use various forms of metadata to aid with this process:
2224
2325* tags (for instance, "@change", "archaic", "vulgar"; see [Tags](http://tatoeba.org/eng/tags/view_all) for more)
2426* sentence ratings
2527* contributors' self-reported skill in the language (as indicated in their profiles)
2628
2729If you are using the data to create language learning materials:
2830
2931* You should probably use only sentences that you or someone else has personally proofread and not rejected, since you do not want to be teaching people errors.
3032
3133Note that most sentences that do not have errors are not explicitly marked with an "OK" rating or tag, and some sentences that do have errors are not marked with a negative rating or tag. Taking all of this into account, you will probably need to perform both custom automated processing and manual review.
3234
3335## Suggestions for Those Planning to Use the Corpus
3436
3537* Tell your audience how you selected the sentences.
3638* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
3739
3840## Download the Tatoeba Corpus
3941
4042[Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.
4143
4244## FAQ
4345
4446* [How do I give proper attribution?](https://en.wiki.tatoeba.org/articles/show/faq#i-would-like-to-use-tatoeba's-data-for-my-project.)
4547* [Where can I download Tatoeba's audio data?](https://en.wiki.tatoeba.org/articles/show/faq#where-can-i-download-tatoeba's-audio-data?)
4648* [How can I download all sentences and translations in specific languages?](https://en.wiki.tatoeba.org/articles/show/faq#how-can-i-download-all-sentences-and-translations-)
diff view generated by jsdifflib

Version at: 17/05/2020, 23:57

# Using the Tatoeba Corpus for Your Own Projects

## Terms of Use

Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use). Note that the terms of use for the audio files are not the same as for sentence text. See the [list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) to see the license, if any, under which these people have offered their files for use outside of tatoeba.org. You should verify these licenses by clicking "audio files" on each member's profile.

## Processing the Tatoeba Corpus

You will probably want to filter out sentences that:

* require correction or improvement
* sound unnatural
* are poor or unnatural translations of other sentences

You may also may want to filter out those that:

* contain vulgar language or sexual references
* contain archaic or old-fashioned content
* are particularly long

You can use various forms of metadata to aid with this process:

* tags (for instance, "@change", "archaic", "vulgar"; see  [Tags](http://tatoeba.org/eng/tags/view_all) for more)
* sentence ratings
* contributors' self-reported skill in the language (as indicated in their profiles)

If you are using the data to create language learning materials:

* You should probably use only sentences that you or someone else has personally proofread and not rejected, since you do not want to be teaching people errors.

Note that most sentences that do not have errors are not explicitly marked with an "OK" rating or tag, and some sentences that do have errors are not marked with a negative rating or tag. Taking all of this into account, you will probably need to perform both custom automated processing and manual review.

## Suggestions for Those Planning to Use the Corpus

* Tell your audience how you selected the sentences.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.

## Download the Tatoeba Corpus

[Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.

## FAQ

* [How do I give proper attribution?](https://en.wiki.tatoeba.org/articles/show/faq#i-would-like-to-use-tatoeba's-data-for-my-project.)
* [Where can I download Tatoeba's audio data?](https://en.wiki.tatoeba.org/articles/show/faq#where-can-i-download-tatoeba's-audio-data?)
* [How can I download all sentences and translations in specific languages?](https://en.wiki.tatoeba.org/articles/show/faq#how-can-i-download-all-sentences-and-translations-)

version at: 28/05/2020, 11:47

# Using the Tatoeba Corpus for Your Own Projects

## Terms of Use

Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use). 

Note that the terms of use for the audio files are not the same as for sentence text. See the [list of audio lists](https://tatoeba.org/eng/sentences_lists/of_user/CK/audio%20-/page:1/sort:modified/direction:desc) to see the license, if any, under which these people have offered their files for use outside of tatoeba.org. You should verify these licenses by clicking "audio files" on each member's profile.

## Processing the Tatoeba Corpus

You will probably want to filter out sentences that:

* require correction or improvement
* sound unnatural
* are poor or unnatural translations of other sentences

You may also may want to filter out those that:

* contain vulgar language or sexual references
* contain archaic or old-fashioned content
* are particularly long

You can use various forms of metadata to aid with this process:

* tags (for instance, "@change", "archaic", "vulgar"; see  [Tags](http://tatoeba.org/eng/tags/view_all) for more)
* sentence ratings
* contributors' self-reported skill in the language (as indicated in their profiles)

If you are using the data to create language learning materials:

* You should probably use only sentences that you or someone else has personally proofread and not rejected, since you do not want to be teaching people errors.

Note that most sentences that do not have errors are not explicitly marked with an "OK" rating or tag, and some sentences that do have errors are not marked with a negative rating or tag. Taking all of this into account, you will probably need to perform both custom automated processing and manual review.

## Suggestions for Those Planning to Use the Corpus

* Tell your audience how you selected the sentences.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.

## Download the Tatoeba Corpus

[Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday.

## FAQ

* [How do I give proper attribution?](https://en.wiki.tatoeba.org/articles/show/faq#i-would-like-to-use-tatoeba's-data-for-my-project.)
* [Where can I download Tatoeba's audio data?](https://en.wiki.tatoeba.org/articles/show/faq#where-can-i-download-tatoeba's-audio-data?)
* [How can I download all sentences and translations in specific languages?](https://en.wiki.tatoeba.org/articles/show/faq#how-can-i-download-all-sentences-and-translations-)

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.