What should I do if I have trouble logging in?
Try clicking the "Remember Me" checkbox before you type in your username and password. If you still have problems logging in when you do that, try clearing your cache and cookies. Here are some guides:
If you still have problems, send an email to Team Tatoeba (email@example.com) telling us your browser and other details that might be important.
Why are some translations in grey?
Grey translations are indirect translations. In other words, they are translations of the translations, and not translations of the main sentence (the main sentence is the sentence in big letters).
We display them because they can be useful, but you should be careful. Their meaning may differ a little from the main sentence.
Why are some sentences in red?
Sentences in red are sentences that have been considered "unapproved" or "unreliable". These sentences have been causing issues on some level (not meeting Tatoeba's quality standards, copyright issues, spam, etc).
Red sentences will be excluded from the downloadable files that Tatoeba exports weekly, and from the random sentence on the homepage.
Most of the sentences in red result from the suspension of an account. When a user has been contributing a large amount of problematic sentences and the account gets suspended, admins have the possibility to mark all the sentences of this account as red. This can of course result in marking perfectly valid sentences as red, but the choice is made when it would be too time-consuming to review each sentence of the account individually.
If you come across a red sentence that you think is valid, you can simply post a comment mentioning @TatoebaAdmins. At the moment, only an admin can remove the red mark.
When a sentence has several possible translations, should I translate it with all the possibilities?
It's really up to you.
If you think one translation is a lot more likely than the other, you can add only that translation. If you think it makes sense to add all the translations, then you can add all the possible translations.
If you think all the translations make sense but it gets boring to add them all every time (and surely you will), then you can just choose randomly, or based on your personal preference, which translation to add.
We don't need to have every sentence translated in every possible way. What is important to us is that the corpus as a whole covers all the possible ways.
For example, we have English sentences with the word "you" and this word can be translated in different ways in other languages. For the sake of simplicity, we'll assume two forms: singular and plural. In this case, we just want to make sure that at least one sentence is translated with the singular form and at least one sentence is translated with the plural form. So if you notice no one translated with the singular form, then you can add it (and vice versa for the plural form). If you feel that one form is under-represented compared to the other, then you can favor translations with the form that is under-represented so that we have a good balance between the two.
Why do I not see all the translations I expect to see?
If you have listed a series of language codes in your settings, Tatoeba will only display translations in the languages you indicated. Leave the field empty to display translations in all languages.
A sentence is not marked with the right language. How do I fix it?
If you are using the old interface design, click on the language icon (usually a flag) to the left of the sentence and select the correct language from the drop-down list. If you are using the new interface design, click on the "edit" icon (pen). If you are not sure whether you are using the old or new interface, go to your "Settings" page and see whether the option next to the text "Display sentences with the new design" is checked.
How can I add tags to a sentence?
To add tags, you must be an advanced contributor.
How can I request a new language?
When contributing in Chinese, should I use simplified or traditional characters?
You can use whichever you like. We have a tool that will automatically convert simplified into traditional, and traditional into simplified.
When browsing sentences, if you set the Chinese sentence as the main sentence, and you have chosen to use the older interface via the settings, you will see an additional icon at the top of the sentence.
If you are using the newer interface, you will see a corner arrow (↳) instead.
Below each Chinese sentence, you will also see the transcription in pinyin, and below the pinyin, the conversion into simplified or traditional.
You can browse the Chinese sentences to see what they look like.
How do I delete my account?
Does Tatoeba provide an API?
No, it does not (yet).
We unfortunately do not have the proper infrastructure to host a public API. Nonetheless, please do not hesitate to contact us to let us know that you would be interested in this.
With more and more people asking us, we will eventually start something and we will be happy to hear more details about the needs of your application/project.
Meanwhile, what you can do is download our sentences from the Downloads page, then build your own API from there.
I would like to use Tatoeba's data for my project. How do I give proper attribution?
For the textual data
Here's an example of good attribution: https://www.clozemaster.com/faq#where-are-the-sentences-from
For the audio data
Our audio corpus has a wider range of licenses and isn't just restricted to CC-BY. You should therefore be more careful about which audio you are using, especially if your project/app is commercial.
You can check the license of each audio recording from the file we release under "Sentences with audio" on our Downloads page.
We recommend that you mention the username of each member whose audio you are reusing, as well as the license they chose.
Here's an example of attribution:
All the audio comes from Tatoeba (https://tatoeba.org), more specifically from the following members of Tatoeba: - userA (license: CC-BY-SA) - userB (license: CC-BY-NC) - userC (license: CC-BY)
Where can I download Tatoeba's audio data?
Currently the only way you can download audio is by fetching each audio file one by one. We don't have one big ZIP file that contains all our audio.
We only have one ZIP file for English, namely tatoeba_audio_eng.zip, a 3.8 GB file generated in November 2017, upon request from the Common Voice project, which wanted to mention our data on their Datasets page.
If you'd like something more up-to-date or in other languages, you would have to do some scripting, using the files on our Downloads page, more specifically under the sections:
- "Sentences with audio": to have the ID's of all the sentences that have audio
- "Sentences": to know what is the language of each sentence
Once you have the language and the ID, the URL to download the audio file is:
For instance: https://audio.tatoeba.org/sentences/eng/7347611.mp3
Note: if you are going to use this data in one of your projects/apps, please be mindful about the license!
How can I download all sentences and translations in specific languages?
From the Downloads page, you can download all sentences in all languages, or all sentences in a specific language. You can also download the translation links for all sentences, but this will probably only be of use to you if you are comfortable writing code.
At the site ManyThings.org, you will find sets of English sentences and their translations into other languages compiled by a Tatoeba member.
If you are comfortable with modifying Python code, you can use the Tatoeba Playground project on GitHub to obtain sentences and translations satisfying the conditions you specify.
You may also be able to find scripts that already do what you want. Here are some examples of Google searches you can do to find these scripts:
You can also check our related Google Group thread. If you have written a script that you want to share, feel free to post a reply to this thread.