Guidelines for Editing Japanese Sentences in Tatoeba


These guidelines, which have been prepared by Jim Breen, are intended to help Japanese speakers who are reading and assessing the sentences.

Many of the Japanese-English pairs of sentences in the Tatoeba database come from the earlier Tanaka Corpus, which was compiled in Japan by the late Professor Tanaka and his students. The sentence pairs are used in many online dictionary sites and dictionary apps to provide examples of usage of Japanese terms. To enable this, a set of word-level indices has been compiled for many of the Japanese sentences. These indices are maintained within the Tatoeba Project.

It is important that any modifications made to the Japanese sentences avoid disrupting the indexing system. Jim Breen, who distributes the indices, monitors the changes to the Japanese sentences and adjusts the indices. At present, about 50-70 sentences are changed each week, and Jim has to update the indices manually for those sentences, so it is important that changes be kept to a reasonable number.

Guidelines to Updating Japanese Sentences

The following guidelines should be followed when changing the Japanese sentences:

  • small changes which improve the quality of a sentence and make it sound more natural are generally fine. Examples of these include:

    • correcting 変換ミス cases;
    • converting a term from kana to kanji or vice-versa;
    • removing unnecessary use of 私は, あなたは, etc.
    • changes such as している to してる;
    • adding よ, の, etc.
    • correcting verb inflections;
    • tidying up the punctuation (this doesn't affect the indexing)
  • more significant changes, such as changing a noun or verb to another one, should be handled with caution. If the existing term is wrong it is fine to change it, but you MUST add a comment saying why you are making the change. It is a good idea to include "@JimBreen" somewhere in the comment to give Jim early notice of the change. Similarly, if changing a term significantly improves the sentence that is OK too, but there must be a comment added.

  • if you think the sentence needs a completely new version, you can add a NEW sentence linked to the English sentence instead, and explain in the comments attached to the English sentence why you are doing this. Again, include "@JimBreen" in the comments as it may be easiest to remove the sentence from the set used for examples if all the terms are already covered by other sentences.

We greatly appreciate Japanese speakers reading and adopting these sentences, which lets people in the project know that a native speaker has "given their stamp of approval."

Jim Breen June 2021


