Notice
This page show a previous version of the articleFAQ
What should I do if I have trouble logging in?
Try clicking the "Remember Me" checkbox before you type in your username and password. If you still have problems logging in when you do that, try clearing your cache and cookies. Here are some guides:
If you still have problems, send an email to Team Tatoeba (team@tatoeba.org) telling us your browser and other details that might be important.
Why are some translations in grey?
Grey translations are indirect translations. In other words, they are translations of the translations, and not translations of the main sentence (the main sentence is the sentence in big letters).
We display them because they can be useful, but you should be careful. Their meaning may differ a little from the main sentence.
When a sentence has several possible translations, should I translate it with all the possibilities?
It's really up to you.
If you think one translation is a lot more likely than the other, you can add only that translation. If you think it makes sense to add all the translations, then you can add all the possible translations.
If you think all the translations make sense but it gets boring to add them all every time (and surely you will), then you can just choose randomly, or based on your personal preference, which translation to add.
We don't need to have every sentence translated in every possible ways. What is important to us is that the corpus as a whole covers all the possible ways.
For example, we have English sentences with the word "you" and this word can be translated in different ways in other languages. For the sake of simplicity, we'll assume two forms: singular and plural. In this case, we just want to make sure that at least one sentence is translated with the singular form and at least one sentence is translated with the plural form. So if you notice no one translated with the singular form, then you can add it (and vice versa for the plural form). If you feel that one form is under-represented compared to the other, then you can favor translations with the form that is under-represented so that we have a good balance between the two.
Why do I not see all the translations I expect to see?
If you have listed a series of language codes in your settings, Tatoeba will only display translations in the languages you indicated. Leave the field empty to display translations in all languages.
Why are some sentences in red?
Red sentences are not approved. They raise copyright issues or are otherwise problematic.
You should not translate them.
A sentence is not marked with the right language. How do I fix it?
If you are using the old interface design, click on the language icon (usually a flag) to the left of the sentence and select the correct language from the drop-down list. If you are using the new interface design, click on the "edit" icon (pen). If you are not sure whether you are using the old or new interface, go to your "Settings" page and see whether the option next to the text "Display sentences with the new design" is checked.
How can I add tags to a sentence?
To add tags, you must be an advanced contributor.
=> See article: advanced-contributors
How can I request a new language?
=> See article: new-language-request
When contributing in Chinese, should I use simplified or traditional characters?
You can use whichever you like. We have a tool that will automatically convert simplified into traditional, and traditional into simplified.
When browsing sentences, if you set the Chinese sentence as the main sentence, you will see an additional icon at the top of the sentence.
- traditional
- simplified
Below each Chinese sentence, you will also see the transcription in pinyin, and below the pinyin, the conversion into simplified or traditional.
You can browse the Chinese sentences to see what they look like.
How do I delete my account?
=> See article: delete-account
Does Tatoeba provide an API?
No, it does not (yet).
We unfortunately do not have the proper infrastructure to host a public API. Nonetheless, please do not hesitate to contact us to let us know that you would be interested in this.
With more and more people asking us, we will eventually start something and we will be happy to hear more details about the needs of your application/project.
Meanwhile, what you can do is download our sentences from the Downloads page, then build your own API from there.
I would like to use Tatoeba's data for my project. How do I give proper attribution?
For the textual data
Basically you just need to write somewhere that some/all of your sentences are from Tatoeba, with a link to https://tatoeba.org, and mention that Tatoeba's data is released under CC-BY 2.0 FR.
Here's an example of good attribution: https://www.clozemaster.com/about#where-are-the-sentences-from
For the audio data
Our audio corpus has a wider range of licenses and isn't just restricted to CC-BY. You should therefore be more careful about which audio you are using, especially if your project/app is commercial.
You can check the license of each audio recording from the file we release under "Sentences with audio" on our Downloads page.
We recommend that you mention the username of each member whose audio you are reusing, as well as the license they chose.
Here's an example of attribution:
All the audio comes from Tatoeba (https://tatoeba.org), more specifically from the following members of Tatoeba:
- userA (license: CC-BY-SA)
- userB (license: CC-BY-NC)
- userC (license: CC-BY)
Where can I download Tatoeba's audio data?
Currently the only way you can download audio is by fetching each audio file one by one. We don't have one big ZIP file that contains all our audio.
We only have one ZIP file for English, namely tatoeba_audio_eng.zip, a 3.8 GB file generated in November 2017, upon request from the Common Voice project, which wanted to mention our data on their Datasets page.
If you'd like something more up-to-date or in other languages, you would have to do some scripting, using the files on our Downloads page, more specifically under the sections:
- "Sentences with audio": to have the ID's of all the sentences that have audio
- "Sentences": to know what is the language of each sentence
Once you have the language and the ID, the URL to download the audio file is:
https://audio.tatoeba.org/sentences/{lang}/{id}.mp3
For instance: https://audio.tatoeba.org/sentences/eng/7347611.mp3
Note: if you are going to use this data in one of your projects/apps, please be mindful about the license!
How can I download all sentences and translations in specific languages?
From the Downloads page, you can download all sentences in all languages, or all sentences in a specific language. You can also download the translation links for all sentences.
If you are comfortable with modifying Python code, you can use the Tatoeba Playground project on GitHub to obtain sentences and translations satisfying the conditions you specify.
At the site ManyThings.org, you will find sets of English sentences and their translations into other languages compiled by a Tatoeba member.
You may also be able to find scripts that already do what you want. Here are some examples of Google searches you can do to find these scripts:
You can also check our related Google Group thread. If you have written a script that you want to share, feel free to post a reply to this thread.
Related Articles
Quick Start Guide
Rules and Guidelines