
This page show a previous version of the article

How to add a new transcription, transliteration or alternative script

This article explains how to add a new transcription, transliteration or alternative script on Tatoeba. We will further refer to these three terms as “transcriptions” in this article because they are technically handled the same way on Tatoeba. The goal of transcriptions is to allow people to read sentences using a different writing system.


This article is subject to change.


Members may request the addition of new transcriptions by posting a message on the Wall. There are a few requirements.

  • The writing system must have an identified ISO 15924 code. It’s a 4-letter code that identify scripts.

  • If there are no existing transcriptions for the language, the ISO 15924 code of the script used in existing sentences must be identified as well.

  • A link to a page (like a Wikipedia article) that explains the transcription system and shows that it is used in real-life. You may as well comment on the transcription to argue how useful would it be for Tatoeba to have it.

  • A list of a substantial amount of transcription pairs. A transcription pair is a sentence and its expected transcription. A “substantial amount” means it shows how the transcription system works as fully as possible, and that the more, the better. The list will be used by developers to ensure the transcription algorithm works and maintain it without having to be experts in the transcription themselves. It should ideally include both real-life examples of transcriptions as well as “edge-case transcriptions” that show particular cases the algorithm should handle. Try to think hard about all the possibilities. A few hints: how to handle proper nouns, punctuation?