| | Version at: 29/04/2013, 03:02 vs. version at: 29/04/2013, 03:11 |
---|
1 | 1 | # Using the Tatoeba Corpus for Your Own Projects |
---|
2 | 2 | |
---|
3 | 3 | ## Terms of Use |
---|
4 | 4 | |
---|
5 | 5 | * Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use). |
---|
6 | 6 | |
---|
7 | 7 | ## Warning: The Tatoeba Corpus is not error-free. |
---|
8 | 8 | |
---|
9 | 9 | * Due to the nature of a public collaborative project, this data will never be 100% free of errors. |
---|
10 | 10 | * Be aware of the following. |
---|
11 | 11 | * We allow non-native speakers to contribute in languages they are learning. |
---|
12 | 12 | * We ask our members not to change archaic language to something that currently sounds natural. |
---|
13 | 13 | * We allow our members to submit book titles and other things you might not consider sentences. |
---|
14 | 14 | * Translations may not always be accurate, even though the linked sentences are correct sentences. |
---|
15 | 15 | |
---|
16 | 16 | ## Suggestions for Those Planning to Use the Corpus |
---|
17 | 17 | |
---|
18 | 18 | * Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.) |
---|
19 | 19 | * You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data. |
---|
20 | 20 | * You should inform your audience that the data may contain errors and explain what steps you have taken to help minimize the errors. |
---|
21 | 21 | * Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections. |
---|
| 22 | * You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language to help make sure what you are teaching people isn't a mistake. |
---|
22 | 23 | |
---|
23 | 24 | |
---|
24 | | |
---|
25 | | |
---|
26 | | |
---|
diff view generated by jsdifflib |
---|
Version at: 29/04/2013, 03:02
# Using the Tatoeba Corpus for Your Own Projects
## Terms of Use
* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
## Warning: The Tatoeba Corpus is not error-free.
* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
* Be aware of the following.
* We allow non-native speakers to contribute in languages they are learning.
* We ask our members not to change archaic language to something that currently sounds natural.
* We allow our members to submit book titles and other things you might not consider sentences.
* Translations may not always be accurate, even though the linked sentences are correct sentences.
## Suggestions for Those Planning to Use the Corpus
* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
* You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data.
* You should inform your audience that the data may contain errors and explain what steps you have taken to help minimize the errors.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
version at: 29/04/2013, 03:11
# Using the Tatoeba Corpus for Your Own Projects
## Terms of Use
* Read the [Terms of Use](http://tatoeba.org/eng/terms_of_use).
## Warning: The Tatoeba Corpus is not error-free.
* Due to the nature of a public collaborative project, this data will never be 100% free of errors.
* Be aware of the following.
* We allow non-native speakers to contribute in languages they are learning.
* We ask our members not to change archaic language to something that currently sounds natural.
* We allow our members to submit book titles and other things you might not consider sentences.
* Translations may not always be accurate, even though the linked sentences are correct sentences.
## Suggestions for Those Planning to Use the Corpus
* Don't use the whole corpus, but do some filtering out of obviously suspect items. (Things like items tagged @need native check, @change, archaic, non-sentence, etc. [Browse Tags](http://tatoeba.org/eng/tags/view_all) to find others.)
* You may want to eliminate all sentences not "owned" by native speakers. However, even this will not guarantee perfect data.
* You should inform your audience that the data may contain errors and explain what steps you have taken to help minimize the errors.
* Since corrections are being made all the time, you should frequently update your project so your audience benefits from these corrections.
* You might want to only use sentences you have personally proofread if you are creating materials for people studying a foreign language to help make sure what you are teaching people isn't a mistake.
Note
The lines in green are the lines that have been added in the new version.
The lines in red are those that have been removed.
Actions