This page describes the various repositories in which code related to Tatoeba is stored, and provides convenient links to those repositories. Viewing some of the repositories my require registration. Contact <a href="">Team Tatoeba</a> for access.

GitHub Repositories

GitHub is the host we now favor for Tatoeba projects except the translation of UI strings, which is hosted on Launchpad (see below). Most, but not all, of the repositories associated with Tatoeba are found on the GitHub Tatoeba organization page.

Code currently used by Tatoeba


The tatoeba2 repository contains the main part of the Tatoeba codebase, written in PHP. Its code and its issue tickets were cloned in February 2014 from the Assembla repository (see below), which is now obsolete.


The Tatodetect repository contains the code for a web service to detect the language of a given string.


The tatowiki repository contains the code that supports this wiki. It is based on the cppcms framework.


The sinoparserd repository contains code for a service to transliterate and segment Chinese languages (Mandarin, Cantonese, Shanghainese).


The suggestd repository pertains to autocompletion functionality. It is currently used to autocomplete tags.

Code to be used by Tatoeba in the future


The tatoeba-api repository does not currently contain any code, but its wiki does contain the Tatoeba API specification, version 2.


The TatoImage repository contains code related to image manipulation and avatar caching.


The tatodb repository contains a graph database library and server, written in C, for dealing with the representation of sentences in Tatoeba. It is designed to be significantly more efficient than using an SQL database.

See also dipdowel's repositories.


The Launchpad repository contains collections of strings associated with the Tatoeba UI, both in the form of the English strings extracted from the code (from strings wrapped in the __() function), and their translations. The source control system used by Launchpad is Bazaar.

The translation status page shows how many strings are translated for each language and how many are not. Tatoeba displays untranslated UI strings in English.

The import queue shows translation files that are waiting for review before they can be committed.


The Assembla repository is the former home of the main body of Tatoeba code. The documentation stored at the Assembla repository is being ported to this wiki.


Article available in: