Version at: 17/08/2018, 10:59 vs. version at: 17/10/2021, 16:23
11# How to add a new transcription, transliteration or alternative script
22
33This article explains how to add a new transcription, transliteration or alternative script on Tatoeba. We will further refer to these three terms as “transcriptions” in this article because they are technically handled the same way on Tatoeba. The goal of transcriptions is to allow people to read sentences using a different writing system.
44
5## Warning
6This article is subject to change.
5## Requirements
6Members may request the addition of new transcriptions by posting a message on the Wall. There are a few requirements:
77
8## Requirements
9Members may request the addition of new transcriptions by posting a message on the Wall. There are a few requirements.
10
11* The writing system must have an identified [ISO 15924 code](https://en.wikipedia.org/wiki/ISO_15924). It’s a 4-letter code that identifies scripts.
8* The writing system must have an identified [ISO 15924 code](https://en.wikipedia.org/wiki/ISO_15924). This is a 4-letter code that identifies scripts.
129
1310* If there are no existing transcriptions for the language, the ISO 15924 code of the script used in existing sentences must be identified as well.
1411
15* A link to a page (like a Wikipedia article) that explains the transcription system and shows that it is used in real-life. You may as well comment on the transcription to argue how useful would it be for Tatoeba to have it.
12* A link to a page (like a Wikipedia article) that explains the transcription system and shows that it is used in real life. Add a comment to explain why it would be useful for Tatoeba.
1613
17* A list of a substantial amount of transcription pairs. A transcription pair is a sentence and its expected transcription. A “substantial amount” means it shows how the transcription system works as fully as possible, and that the more, the better. The list will be used by developers to ensure the transcription algorithm works and maintain it without having to be experts in the transcription themselves. It should ideally include both real-life examples of transcriptions as well as “edge-case transcriptions” that show particular cases the algorithm should handle. Try to think hard about all the possibilities. A few hints: how to handle proper nouns, punctuation?
14* A list of a substantial number of transcription pairs. A transcription pair is a sentence and its expected transcription. A “substantial number” is sufficient to show as fully as possible how the transcription system works; the more, the better. The list will be used by developers to ensure the transcription algorithm works and maintain it without having to be experts in the transcription themselves. It should ideally include both real-life examples of transcriptions as well as “edge-case transcriptions” that show particular cases the algorithm should handle. Try to think hard about all the possibilities. A few hints: how should proper nouns and punctuation be handled?
1815
1916## Autogenerated transcriptions
20We provide autogenerated transcriptions for each language we allow transcriptions in. It means a piece of software reads original sentences and converts them into the target script as soon as they are modified or added. You can see whether a transcription has been autogenerated or further edited by contributors by mouse hovering it.
17We provide autogenerated transcriptions for each language for which we allow transcriptions. This means that a piece of software reads the original sentences and converts them into the target script as soon as they are modified or added. You can see whether a transcription has been autogenerated or further edited by contributors by hovering a mouse over it.
2118
22Depending on the quality, accuracy and reliableness of the software used, autogenerated transcriptions may be further editable by contributors or displayed with a warning icon. Editable transcriptions are marked with a pen icon on the left (which may appear disabled depending on your particular edition rights on a particular transcription). No pen means the transcription is not editable at all (or that you're not logged in).
19Depending on the quality, accuracy and reliableness of the software used, autogenerated transcriptions may be further editable by contributors or displayed with a warning icon. Editable transcriptions are marked with a pen icon on the left (which may appear disabled depending on your particular editing rights on a particular transcription). No pen means the transcription is not editable at all (or that you're not logged in).
2320
2421Developers decide whether a given type of transcription is made editable or not after consulting contributors and checking accuracy against the provided list of transcription pairs. Editable transcriptions usually meet the following requirements:
2522
26* The format of the transcription (syntax, etc.) is clearly defined. If it's not, it's more desirable not to allow edition until a format has been agreed upon, in order to avoid inconsistencies.
23* The format of the transcription (syntax, etc.) is clearly defined. If it's not, it's more desirable not to allow editing until a format has been agreed upon, in order to avoid inconsistencies.
2724
28* The autogeneration software is *mostly* reliable but produces errors from times to times. If it's *rather not* reliable, edition may be prevented because it would require too much human work from contributors to fix all the transcriptions. If it's *near-100% perfect*, edition may be prevented as well unless it produces substantial errors.
25* The autogeneration software is *mostly* reliable but produces errors from time to time. If it's *not very* reliable, editing may be prevented because it would require too much human work from contributors to fix all the transcriptions. If it's *near-100% perfect*, editing may be prevented as well.
diff view generated by jsdifflib

Version at: 17/08/2018, 10:59

# How to add a new transcription, transliteration or alternative script

This article explains how to add a new transcription, transliteration or alternative script on Tatoeba. We will further refer to these three terms as “transcriptions” in this article because they are technically handled the same way on Tatoeba. The goal of transcriptions is to allow people to read sentences using a different writing system.

## Warning
This article is subject to change.

## Requirements
Members may request the addition of new transcriptions by posting a message on the Wall. There are a few requirements.

* The writing system must have an identified [ISO 15924 code](https://en.wikipedia.org/wiki/ISO_15924). It’s a 4-letter code that identifies scripts.

* If there are no existing transcriptions for the language, the ISO 15924 code of the script used in existing sentences must be identified as well.

* A link to a page (like a Wikipedia article) that explains the transcription system and shows that it is used in real-life. You may as well comment on the transcription to argue how useful would it be for Tatoeba to have it.

* A list of a substantial amount of transcription pairs. A transcription pair is a sentence and its expected transcription. A “substantial amount” means it shows how the transcription system works as fully as possible, and that the more, the better. The list will be used by developers to ensure the transcription algorithm works and maintain it without having to be experts in the transcription themselves. It should ideally include both real-life examples of transcriptions as well as “edge-case transcriptions” that show particular cases the algorithm should handle. Try to think hard about all the possibilities. A few hints: how to handle proper nouns, punctuation?

## Autogenerated transcriptions
We provide autogenerated transcriptions for each language we allow transcriptions in. It means a piece of software reads original sentences and converts them into the target script as soon as they are modified or added. You can see whether a transcription has been autogenerated or further edited by contributors by mouse hovering it.

Depending on the quality, accuracy and reliableness of the software used, autogenerated transcriptions may be further editable by contributors or displayed with a warning icon. Editable transcriptions are marked with a pen icon on the left (which may appear disabled depending on your particular edition rights on a particular transcription). No pen means the transcription is not editable at all (or that you're not logged in). 

Developers decide whether a given type of transcription is made editable or not after consulting contributors and checking accuracy against the provided list of transcription pairs. Editable transcriptions usually meet the following requirements:

* The format of the transcription (syntax, etc.) is clearly defined. If it's not, it's more desirable not to allow edition until a format has been agreed upon, in order to avoid inconsistencies.

* The autogeneration software is *mostly* reliable but produces errors from times to times. If it's *rather not* reliable, edition may be prevented because it would require too much human work from contributors to fix all the transcriptions. If it's *near-100% perfect*, edition may be prevented as well unless it produces substantial errors.

version at: 17/10/2021, 16:23

# How to add a new transcription, transliteration or alternative script

This article explains how to add a new transcription, transliteration or alternative script on Tatoeba. We will further refer to these three terms as “transcriptions” in this article because they are technically handled the same way on Tatoeba. The goal of transcriptions is to allow people to read sentences using a different writing system.

## Requirements
Members may request the addition of new transcriptions by posting a message on the Wall. There are a few requirements:

* The writing system must have an identified [ISO 15924 code](https://en.wikipedia.org/wiki/ISO_15924). This is a 4-letter code that identifies scripts.

* If there are no existing transcriptions for the language, the ISO 15924 code of the script used in existing sentences must be identified as well.

* A link to a page (like a Wikipedia article) that explains the transcription system and shows that it is used in real life. Add a comment to explain why it would be useful for Tatoeba.

* A list of a substantial number of transcription pairs. A transcription pair is a sentence and its expected transcription. A “substantial number” is sufficient to show as fully as possible how the transcription system works; the more, the better. The list will be used by developers to ensure the transcription algorithm works and maintain it without having to be experts in the transcription themselves. It should ideally include both real-life examples of transcriptions as well as “edge-case transcriptions” that show particular cases the algorithm should handle. Try to think hard about all the possibilities. A few hints: how should proper nouns and punctuation be handled?

## Autogenerated transcriptions
We provide autogenerated transcriptions for each language for which we allow transcriptions. This means that a piece of software reads the original sentences and converts them into the target script as soon as they are modified or added. You can see whether a transcription has been autogenerated or further edited by contributors by hovering a mouse over it.

Depending on the quality, accuracy and reliableness of the software used, autogenerated transcriptions may be further editable by contributors or displayed with a warning icon. Editable transcriptions are marked with a pen icon on the left (which may appear disabled depending on your particular editing rights on a particular transcription). No pen means the transcription is not editable at all (or that you're not logged in). 

Developers decide whether a given type of transcription is made editable or not after consulting contributors and checking accuracy against the provided list of transcription pairs. Editable transcriptions usually meet the following requirements:

* The format of the transcription (syntax, etc.) is clearly defined. If it's not, it's more desirable not to allow editing until a format has been agreed upon, in order to avoid inconsistencies.

* The autogeneration software is *mostly* reliable but produces errors from time to time. If it's *not very* reliable, editing may be prevented because it would require too much human work from contributors to fix all the transcriptions. If it's *near-100% perfect*, editing may be prevented as well.

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.