| | Version at: 29/04/2013, 03:21 vs. version at: 29/04/2013, 03:24 |
|---|
| 1 | 1 | # Using the Tatoeba Corpus for Your Own Projects |
|---|
| 2 | 2 | |
|---|
| 3 | 3 | ## Terms of Use |
|---|
| 4 | 4 | |
|---|
| 5 | 5 | * Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use). |
|---|
| 6 | 6 | |
|---|
| 7 | 7 | ## Warning: The Tatoeba Corpus is not error-free. |
|---|
| 8 | 8 | |
|---|
| 9 | 9 | * Due to the nature of a public collaborative project, this data will never be 100% free of errors. |
|---|
| 10 | 10 | * Be aware of the following. |
|---|
| 11 | 11 | * We allow non-native speakers to contribute in languages they are learning. |
|---|
| 12 | 12 | * We ask our members not to change archaic language to something that currently sounds natural. |
|---|
| 13 | 13 | * We allow our members to submit book titles and other things you might not consider sentences. |
|---|
| 14 | 14 | * Translations may not always be accurate, even though the linked sentences are correct sentences. |
|---|
| 15 | 15 | |
|---|
| 16 | 16 | ## Suggestions for Those Planning to Use the Corpus |
|---|
| 17 | 17 | |
|---|
| 18 | 18 | * Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.) |
|---|
| 19 | 19 | * You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data. |
|---|
| 20 | 20 | * You should inform your audience that the data may contain errors and explain what steps you have taken to help minimize the errors. |
|---|
| 21 | 21 | * Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections. |
|---|
| 22 | 22 | * You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language. This helps make sure that what you are teaching people isn't a mistake. |
|---|
| 23 | 23 | |
|---|
| 24 | 24 | ## Download the Tatoeba Corpus |
|---|
| 25 | 25 | |
|---|
| 26 | 26 | * [Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday at 9:00 a.m., France time. |
|---|
| 27 | * Note: Downloads are actually ready about 2 minutes later. |
|---|
| 27 | 28 | * Central European Time (CET) = GMT+2 |
|---|
| 28 | 29 | |
|---|
| 29 | 30 | |
|---|
| diff view generated by jsdifflib |
|---|
Version at: 29/04/2013, 03:21
# Using the Tatoeba Corpus for Your Own Projects
## Terms of Use
* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
## Warning: The Tatoeba Corpus is not error-free.
* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
* Be aware of the following.
* We allow non-native speakers to contribute in languages they are learning.
* We ask our members not to change archaic language to something that currently sounds natural.
* We allow our members to submit book titles and other things you might not consider sentences.
* Translations may not always be accurate, even though the linked sentences are correct sentences.
## Suggestions for Those Planning to Use the Corpus
* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
* You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data.
* You should inform your audience that the data may contain errors and explain what steps you have taken to help minimize the errors.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
* You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language. This helps make sure that what you are teaching people isn't a mistake.
## Download the Tatoeba Corpus
* [Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday at 9:00 a.m., France time.
* Central European Time (CET) = GMT+2
version at: 29/04/2013, 03:24
# Using the Tatoeba Corpus for Your Own Projects
## Terms of Use
* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
## Warning: The Tatoeba Corpus is not error-free.
* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
* Be aware of the following.
* We allow non-native speakers to contribute in languages they are learning.
* We ask our members not to change archaic language to something that currently sounds natural.
* We allow our members to submit book titles and other things you might not consider sentences.
* Translations may not always be accurate, even though the linked sentences are correct sentences.
## Suggestions for Those Planning to Use the Corpus
* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
* You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data.
* You should inform your audience that the data may contain errors and explain what steps you have taken to help minimize the errors.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
* You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language. This helps make sure that what you are teaching people isn't a mistake.
## Download the Tatoeba Corpus
* [Downloads](http://tatoeba.org/eng/download_tatoeba_example_sentences) are updated every Saturday at 9:00 a.m., France time.
* Note: Downloads are actually ready about 2 minutes later.
* Central European Time (CET) = GMT+2
Note
The lines in green are the lines that have been added in the new version.
The lines in red are those that have been removed.
Actions