Version at: 09/06/2014, 03:33 vs. version at: 09/06/2014, 03:36
1**[!]** If you are interested in coding a Google Summer of Code project for Tatoeba during the summer of 2014, please submit your application on the [GSoC website](https://www.google-melange.com/gsoc/homepage/google/gsoc2014). You should apply on the website for every organization that you'd be interested in working for. The deadline is **March 21**.
2
3
1This page was designed for students who were interested in coding a [Google Summer of Code 2014](https://www.google-melange.com/gsoc/homepage/google/gsoc2014) project for Tatoeba. While the deadline (March 21, 2014) passed a while ago, and the selected GSoC students have already started work on their projects, the information on this page should still be of general interest to developers.
42
53GSoC ideas for student projects
64===============================
75
86This page lists example ideas for students who would like to take part in [Google Summer of Code](http://www.google-melange.com/) and be mentored by Tatoeba. To quote [GSoC FAQ](http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page#3._What_is_an_Ideas_list):
97
108<blockquote>
119<p>An Ideas list should be a list of suggested student projects. This list is meant to introduce contributors to your project's needs and to provide inspiration to would-be student applicants. It is useful to classify each idea as specifically as possible, e.g. "must know Python" or "easier project; good for a student with more limited experience with C++." If your organization plans to provide a proposal template for the students, it would be good to include it on your Ideas list.</p>
1210
1311<p>Keep in mind that your ideas list should be a starting point for student proposals; we've heard from past mentoring organization participants that some of their best student projects are those that greatly expanded on a proposed idea or were blue-sky proposals not mentioned on the ideas list at all. A link to a bug tracker for your open source organization is NOT an ideas list.</p>
1412
1513<p>You can check out the <a href="http://community.kde.org/GSoC/2011/Ideas">Ideas list for KDE</a> for Google Summer of Code in 2011 to get an idea of what we’re looking for in an ideas list. </p>
1614</blockquote>
1715
1816If you're a student, you're invited to discuss any of these ideas, as well as propose your own. We encourage you to get familiar with the [site](http://tatoeba.org) and the [codebase](main#developers) immediately. Registering on the site, looking at the existing sentences, and adding a few of your own (preferably in the language that you know best) will be a valuable experience for you.
1917
2018## Contact
2119
2220To contact us developers, use one of these:
2321
2422* Email/Google groups: [Tatoeba's dev mailing list](https://groups.google.com/forum/#!forum/tatoebaproject)
2523* IRC: [Tatoeba on #Freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
2624* XMPP: [Tatoeba conference room on chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join)
2725
2826To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](http://tatoeba.org/wall/index).
2927
3028
3129About Tatoeba (for students)
3230----------------------------
3331
3432Tatoeba is a libre/free database of example sentences translated into many
3533languages. Our goal is to create a resource for people studying
3634languages—either to learn or research. The database is currently used:
3735
3836* As a source of example sentences by free dictionaries and language
3937learning websites (like Jim Breen’s WWWJDIC; Jim Breen is actually a
4038member too):
4139
4240 * There's a list of free dictionary and language learning websites
4341 using Tatoeba's corpus maintained by our member CK:
4442 http://a4esl.com/temporary/tatoeba/links.html
4543
4644* As a rich resource for language learners: They can find out how to
4745use words or how to translate grammatical constructs and idioms.
4846
4947* For research: example papers include:
5048
5149 * Research on treebanking Japanese (Francis Bond, 栗林 孝行 [Takayuki
5250Kuribayashi], 橋本 力 [Hashimoto Chikara] (2008) HPSGに基づくフリーな日本語ツリー バンクの構築
5351[A free Japanese Treebank based on HPSG]. In 14th Annual Meeting of The
5452Association for Natural Language Processing, Tokyo),
5553
5654 * Statistical machine translation (Eric Nichols, Francis Bond,
5755Darren Scott Appling and Yuji Matsumoto (2010) Paraphrasing Training
5856Data for Statistical Machine Translation. Journal of Natural Language
5957Processing, 17(3), pages 101-122)
6058
6159The main site currently has about 1 million page views and 250 thousand unique visitors monthly, as reported by Google Analytics, and the corpus is growing steadily by 3% or more every month.
6260
6361
6462Current site
6563------------
6664
6765Extending current PHP site:
6866
6967* programming in PHP with the CakePHP framework
7068* shell tools for maintenance (e.g.: better export scripts)
7169* JavaScript
7270
7371### Better export scripts
7472
7573Currently CSV dumps are done weekly. They require the database to be switched into a read-only mode, take 5~10 minutes, and do not contain some important information, such as tag creator and comments. CSV dumps are important for people who cooperate with Tatoeba by creating additional tools, so their quality is vital for healthy collaboration.
7674
7775**Deliverables**: An database export mechanism that:
7876
7977 * Dumps all interesting information (everything currently in the data dumps plus modification history, sentence comments, the wall, etc.).
8078 * Can create incremental dumps, making them faster, which will allow them to be made more frequently.
8179 * Provides an interface for collaborators to get notifications about new dumps and allows automatic access.
8280 * (Advanced) Provides a stream of updates via web sockets or a similar mechanism.
8381
8482**Prerequisite knowledge**: a scripting language (Python preferred), PHP, MySQL.
8583
8684### Administrative Scripts
8785
8886The current site has experienced a couple of crashes and instabilities. We would like to increase the number of users who are fully capable of administering it with ease and reliability. In order to ease administration and quickly recover from disaster, a number of scripts covering common administrative tasks are needed.
8987
9088**Deliverables**:
9189
9290*Shell scripts* that cover:
9391
9492 * backup
9593 * restoring of backups
9694 * export of data
9795 * import of data
9896 * adding new languages
9997 * deduplication of sentences
10098 * indexing of sentences by the search daemon
10199 * preserving existing translations when typos in the source UI strings are fixed
102100 * getting external services up and running
103101 * updating the production site from the repository
104102 * deployment on a real server from scratch
105103 * deployment on a development machine from scratch
106104 * monitoring the server and logging load and activity
107105 * other necessary tasks
108106
109107An *administrative interface* accessible by admins to manually execute any of these tasks on the server with progress bars and statistics.
110108
111109**Prerequisite knowledge**: a scripting language (bash, Python, Perl, etc..), possibly familiarity with a build system (ansible, vagrant), and possibly familiarity with setting up and maintaining a monitoring system (newrelic, nagios, cacti, munin)
112110
113111### Testing Suite
114112
115113The current website doesn't have any tangible automated tests or any form of continuous integration. Any new code that gets added to the repository can break the existing website and most testing is done manually at this point. In order to make the code future proof and give the users a stable experience, a battery of tests for the main functionality of the website is needed.
116114
117115**Deliverables**: A number of tests that create a test database and emulate a browser, testing some or all of the functionality that the website currently offers. Refer to this [list](http://en.wiki.tatoeba.org/articles/show/functionality-test-list)
118116
119117**Prerequisite knowledge**: a scripting language (Python, PHP, etc...), MySQL, and a web testing framework (selenium, or something similar).
120118
121119### API
122120
123121The Tatoeba database is used either through the main website interface or through data dumps. Having a real API that can be called through AJAX and return machine-readable results would provide real-time access for external applications.
124122
125123**Deliverables**: A web application that provides a set of API calls for data stored in the current database. The API should cover all data available through the current web interface, including sentence comments, wall comments, recently-added sentences and top recent contributors.
126124
127125**Prerequisite knowledge**: a web application language (Python or PHP preferred), MySQL, and a data exchange format such as JSON or XML.
128126
129127### API Compliant Interface
130128
131129With the presence of an API and an API [spec](https://github.com/trang/tatoeba-api/wiki/Tatoeba-API-specification-2) most direct SQL queries in the current codebase will become obsolete and the preferred way to do it would be through the API. So a rewrite or a completely new Interface will be the next logical step.
132130
133131**Deliverables**: A rewrite of the current codebase to use the API instead of SQL queries or a completely new interface in another language, preferably in a javascript framework.
134132
135133**Prerequisite knowledge**: strong knowledge of CakePHP, familiarity with the API, and familiarity with the current codebase, or strong knowledge of a web framework (django for example) or a javascript framework (angularjs for example).
136134
137135### Improvements in user interface for end users
138136
139137Tatoeba now handles several kinds of queries, but more are desired. The translation interface, in particular, needs improvement. Examples of desired types of queries:
140138
141139* Get all sentences in a given language by a given user that have not been translated into a given language. For example: *Show me all English sentences by user "CK" not yet translated into Japanese.*
142140
143141* Same as above, but limited to sentences with audio. For example: *Show me all English sentences by "CK" with audio that have not been translated into Japanese.*
144142
145143* Get all sentences by native speakers of a given language not yet translated into a given language. For example: *Show me all Finnish sentences by native speakers not translated into Hungarian.*
146144
147145* Get all sentences in a given language with a certain tag not translated into a given language. For example: *Show me all Georgian sentences with the tag "restaurant" not translated into Armenian.*
148146
149147* Same as above, but limited to sentences by native speakers not translated into a given language. For example: *Show me all Korean sentences by native speakers with the tag "weather" not translated into Japanese.*
150148
151149* Get all sentences in a given language under a certain length not yet translated into a given language. For example: *Show me all Japanese sentences fewer than 50 characters in length not translated into French.*
152150
153151* Same as above, but limited to native speaker sentences.
154152
155153* Same as above, but limited to sentences by a given user.
156154
157155* Get all sentences by native speakers of a given language that match a given search keyword that have not been translated into a given language. For example: *Show all English sentences with the word "mountain" not translated into Japanese.*
158156
159157* Same as above, but limited to native speaker sentences.
160158
161159* Same as above, but limited to sentences by a given user.
162160
163161**Deliverables:** Implementation of some or all of the above. Project might include additional queries. It would be highly desired to provide a generic way of adding new types of queries.
164162
165163**Prerequisite knowledge**: PHP, CakePHP.
166164
167165### Allow users to follow each other
168166
169167This feature would allow users to keep track of newly created sentences of other users, just like Twitter does. One could be able to get notified of new sentences of users he or she's following, and to browse them. Public and private visibility of who’s following who should be discussed prior to development.
170168
171169**Deliverables:** a mean to follow one or more users ; a page that displays the sentences of the followed users and allows to browse and search through ; configurable notifications about new sentences of the followed users ; displaying of who’s following who.
172170
173171**Prerequisite knowledge**: PHP, CakePHP.
174172
175173### Word requests
176174
177175This feature would allow users to request example sentences that show the correct usage of a given word or phrase. Contributors could browse lists of 'requested words' and add sentences that include them.
178176
179177People interested in this idea should consider and discuss possible implementations details prior to development. Typical questions include:
180178
181179* What should be the scope of the lists (per-language, per-user…)?
182180* How should the lists be maintained?
183181* Can we indicate that a 'requested word' now has enough example sentences? If yes, how?
184182* What’s the lifecycle of a typical requested word?
185183* What if users want to express additional information in their requests, such as the context or sense for the requested word?
186184* What about synonyms, inflections… ?
187185
188186**Deliverables:** a mean to express the need of example sentences of a given word ; a way to easily contribute new sentences that shows example of wanted words
189187
190188**Prerequisite knowledge**: PHP, CakePHP.
191189
192190### Show pronunciation in IPA for sentences
193191
194192IPA stands for "International Phonetic Alphabet" and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g., English). Tatoeba could display IPA pronunciation for each sentence in basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.
195193
196194**Deliverables:** A mechanism that shows IPA pronunciation for some languages (chosen by the student). This can be done server-side (as a standalone service or part of existing code) or client-side (using JavaScript). Mechanism should allow pre-generating pronunciation descriptions and should provide means to manually edit pronunciation later. Mechanism can rely on 3rd party tool to generate pronunciation descriptions.
197195
198196**Prerequisite knowledge**: web technology, some web application stack (PHP, Python or CppCMS preferred).
199197
200198New site & CppCMS
201199-----------------
202200
203201Helping Sysko with tatowiki, tatodb. Extending [CppCMS](http://cppcms.com/wikipp/en/page/main). As the new site is still mostly being planned, there are no specific project ideas for the moment. Please ask on the IRC channel for more information. Note that many projects in this category will have an experimental nature, and their scope highly depends on your skills.
204202
205203Standalone user tools
206204---------------------
207205
208206Work on tools such as the following:
209207
210208* [shtooka recorder](http://a4esl.com/temporary/tatoeba/shtooka/) or [swac-record](http://zmoo.fr/swac-tools/) for recording audio for sentences
211209* [tatoparser](https://github.com/qdii/tatoeba_parser) for retrieving sentences that match regular expressions
212210* [katoeba](https://github.com/sadhen/katoeba)
213211
214212Create new tools, such as apps for smartphones, for ordinary or advanced contributors.
215213
216214### Android/iPhone application
217215
218216iPhone users are about 12%, and Android users about 7% of the site visitors. It might help them immensely if they could use a dedicated application.
219217
220218**Deliverables**: A smartphone application for easy access to Tatoeba. Examples of features:
221219
222220* Querying the online Tatoeba site
223221* Adding sentences
224222* Translating
225223* Performing typical corpus maintenance tasks (linking/unlinking sentences, changing sentence language, tagging, etc.)
226224* Access to wall and sentence comments
227225* Recording voice
228226* Offline database access (more difficult!)
229227
230228Note: It is not expected to implement all of these features during a single GSoC event. Depending on your skills, you might prepare a proposal for a basic set of features (if you don't have much experience in mobile development yet) or a more complex or targeted application (if you do have experience and want to prepare something more feature-complete).
231229
232230**Prerequisite knowledge**: Java and Android development or iPhone and iOS developement; using web services.
233231
234232### Streamlined linking of multiple sentences
235233
236234Where multiple sentences in a source language have the same translation in the target language, make it easy to link those source sentences to the same target translation. Collecting the sentences that are likely to have the same translation could be as simple as presenting sentences in order of creation, since variants of a sentence that vary only in, e.g., the number of the pronoun (where the singular and plural forms of the second person map to the same word in English) are likely to be entered consecutively.
237235
238236**Prerequisite knowledge**: JavaScript, possibly Java, possibly SQL
239237
240238### Help bots
241239
242240Produce bots like [those in Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Bots), to help with maintenance and repetitive tasks such as fixing common mistakes, wrong flag, etc. As on Wikipedia, users that are actually bots should be identified somehow on the website side. Ideally, create a library that could be used as a base to create bots that interact with the Tatoeba website.
243241
244242Note: this idea would highly benefit from having a real API, which is another project listed here.
245243
246244**Prerequisite knowledge**: web
247245
248246Other ideas
249247-----------
250248
251249External services, like [CK's Temporary Tatoeba site](http://a4esl.com/temporary/tatoeba/). Other ideas.
252250
253251### XMPP Integration for Tatoeba
254252
255253Use the XMPP communications protocol to integrate the manipulation of sentences, comments, and wall posts, as well as live feeds of latest comments, sentence additions, and wall posts, with services such as pubsub.
256254
257255**Deliverables**:
258256
259257* An XEP that outlines the protocol tatoeba would use for all of those operations over XMPP ready to be submitted to the XSF
260258* An implementation of this XEP in XMPP clients as plugins, poezio and gajim are top priorities.
261259* An implementation of this XEP server side as a module, prosody is a top priority.
262260
263261**Prerequisite Knowledge**: XMPP, PubSub, Python, Lua, Familiarity with prosody/gajim/poezio codebases and plugin architecture
264262
265263
266264### SRS deck generator
267265
268266Spaced Repetition Systems such as Anki and Mnemosyne are popular tools for learning languages. However, preparing a good SRS deck is a time-consuming task. Therefore, an automated way to generate a deck from a list of sentences (e.g., sentences on a Tatoeba list, sentences tagged by some specific tag, etc.) would help language learners.
269267
270268**Deliverables**: an application (preferably a web-based one) that would use Tatoeba database (for example in the form of a weekly CSV data dump) to create SRS decks for major flash card applications. Examples of features:
271269
272270* Generate a simple deck from a Tatoeba list, tag, or search query.
273271* Generate an N+1-style deck based on user's list of known words and Tatoeba database. (User gives a list of N words that s/he already knows. System chooses a new sentence where exactly one word is unknown, and the rest belong to the already known set.)
274272* Generated decks have proper internal structure (as for Anki decks: proper field scheme is used to store knowledge, so editing is easy).
275273
276274**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; knowledge about SRS.
277275
278276### Browsable graph of sentence links
279277
280278Given a sentence, display it as a [graph](https://en.wikipedia.org/wiki/Graph_%28data_structure%29) [like this](http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html#rule2) the linked sentences up to a given depth. The main purpose of such a graph is to show users how Tatoeba is structured at a glance. The current interface doesn’t provide such a view, but it’s important that users understand the actual structure of Tatoeba. This idea could be freely extended to a complete interface allowing linking and unlinking with a click, filter by language, edit sentences, or whatever you can think of.
281279
282280**Deliverables**: a web application or a client-side JavaScript program that provides a graph view of a group of sentences, and allows manipulating them. Code can either operate on database directly or use existing or planned APIs.
283281
284282Note that this idea can be implemented as part of the current code base in PHP or as an experimental service for a new CppCMS site (preferred).
285283
286284**Prerequisite knowledge**: PHP, Python or CppCMS.
287285
288286
diff view generated by jsdifflib

Version at: 09/06/2014, 03:33

**[!]** If you are interested in coding a Google Summer of Code project for Tatoeba during the summer of 2014, please submit your application on the [GSoC website](https://www.google-melange.com/gsoc/homepage/google/gsoc2014). You should apply on the website for every organization that you'd be interested in working for. The deadline is **March 21**. 



GSoC ideas for student projects
===============================

This page lists example ideas for students who would like to take part in [Google Summer of Code](http://www.google-melange.com/) and be mentored by Tatoeba. To quote [GSoC FAQ](http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page#3._What_is_an_Ideas_list):

<blockquote>
<p>An Ideas list should be a list of suggested student projects. This list is meant to introduce contributors to your project's needs and to provide inspiration to would-be student applicants. It is useful to classify each idea as specifically as possible, e.g. "must know Python" or "easier project; good for a student with more limited experience with C++." If your organization plans to provide a proposal template for the students, it would be good to include it on your Ideas list.</p>

<p>Keep in mind that your ideas list should be a starting point for student proposals; we've heard from past mentoring organization participants that some of their best student projects are those that greatly expanded on a proposed idea or were blue-sky proposals not mentioned on the ideas list at all. A link to a bug tracker for your open source organization is NOT an ideas list.</p>

<p>You can check out the <a href="http://community.kde.org/GSoC/2011/Ideas">Ideas list for KDE</a> for Google Summer of Code in 2011 to get an idea of what we’re looking for in an ideas list. </p>
</blockquote>

If you're a student, you're invited to discuss any of these ideas, as well as propose your own. We encourage you to get familiar with the [site](http://tatoeba.org) and the [codebase](main#developers) immediately. Registering on the site, looking at the existing sentences, and adding a few of your own (preferably in the language that you know best) will be a valuable experience for you. 

## Contact

To contact us developers, use one of these:

* Email/Google groups: [Tatoeba's dev mailing list](https://groups.google.com/forum/#!forum/tatoebaproject)
* IRC: [Tatoeba on #Freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
* XMPP: [Tatoeba conference room on chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join)

To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](http://tatoeba.org/wall/index).


About Tatoeba (for students)
----------------------------

Tatoeba is a libre/free database of example sentences translated into many 
languages. Our goal is to create a resource for people studying 
languages—either to learn or research. The database is currently used:

* As a source of example sentences by free dictionaries and language 
learning websites (like Jim Breen’s WWWJDIC; Jim Breen is actually a 
member too):

  * There's a list of free dictionary and language learning websites
 using Tatoeba's corpus maintained by our member CK:
        http://a4esl.com/temporary/tatoeba/links.html

* As a rich resource for language learners: They can find out how to 
use words or how to translate grammatical constructs and idioms.

* For research: example papers include:

  * Research on treebanking Japanese (Francis Bond, 栗林 孝行 [Takayuki 
Kuribayashi], 橋本 力 [Hashimoto Chikara] (2008) HPSGに基づくフリーな日本語ツリー バンクの構築 
[A free Japanese Treebank based on HPSG]. In 14th Annual Meeting of The 
Association for Natural Language Processing, Tokyo),

  * Statistical machine translation (Eric Nichols, Francis Bond, 
Darren Scott Appling and Yuji Matsumoto (2010) Paraphrasing Training 
Data for Statistical Machine Translation. Journal of Natural Language 
Processing, 17(3), pages 101-122)

The main site currently has about 1 million page views and 250 thousand unique visitors monthly, as reported by Google Analytics, and the corpus is growing steadily by 3% or more every month.


Current site
------------

Extending current PHP site: 

* programming in PHP with the CakePHP framework
* shell tools for maintenance (e.g.: better export scripts)
* JavaScript

### Better export scripts

Currently CSV dumps are done weekly. They require the database to be switched into a read-only mode, take 5~10 minutes, and do not contain some important information, such as tag creator and comments. CSV dumps are important for people who cooperate with Tatoeba by creating additional tools, so their quality is vital for healthy collaboration.

**Deliverables**: An database export mechanism that:

  * Dumps all interesting information (everything currently in the data dumps plus modification history, sentence comments, the wall, etc.).
  * Can create incremental dumps, making them faster, which will allow them to be made more frequently.
  * Provides an interface for collaborators to get notifications about new dumps and allows automatic access.
  * (Advanced) Provides a stream of updates via web sockets or a similar mechanism.

**Prerequisite knowledge**: a scripting language (Python preferred), PHP, MySQL.

### Administrative Scripts

The current site has experienced a couple of crashes and instabilities. We would like to increase the number of users who are fully capable of administering it with ease and reliability. In order to ease administration and quickly recover from disaster, a number of scripts covering common administrative tasks are needed.

**Deliverables**:

*Shell scripts* that cover:

 * backup
 * restoring of backups
 * export of data
 * import of data 
 * adding new languages
 * deduplication of sentences
 * indexing of sentences by the search daemon
 * preserving existing translations when typos in the source UI strings are fixed 
 * getting external services up and running
 * updating the production site from the repository
 * deployment on a real server from scratch
 * deployment on a development machine from scratch
 * monitoring the server and logging load and activity
 * other necessary tasks

An *administrative interface* accessible by admins to manually execute any of these tasks on the server with progress bars and statistics. 

**Prerequisite knowledge**: a scripting language (bash, Python, Perl, etc..), possibly familiarity with a build system (ansible, vagrant), and possibly familiarity with setting up and maintaining a monitoring system (newrelic, nagios, cacti, munin)

### Testing Suite

The current website doesn't have any tangible automated tests or any form of continuous integration. Any new code that gets added to the repository can break the existing website and most testing is done manually at this point. In order to make the code future proof and give the users a stable experience, a battery of tests for the main functionality of the website is needed.

**Deliverables**: A number of tests that create a test database and emulate a browser, testing some or all of the functionality that the website currently offers. Refer to this [list](http://en.wiki.tatoeba.org/articles/show/functionality-test-list)

**Prerequisite knowledge**: a scripting language (Python, PHP, etc...), MySQL, and a web testing framework (selenium, or something similar).

### API

The Tatoeba database is used either through the main website interface or through data dumps. Having a real API that can be called through AJAX and return machine-readable results would provide real-time access for external applications.

**Deliverables**: A web application that provides a set of API calls for data stored in the current database. The API should cover all data available through the current web interface, including sentence comments, wall comments, recently-added sentences and top recent contributors.

**Prerequisite knowledge**: a web application language (Python or PHP preferred), MySQL, and a data exchange format such as JSON or XML.

### API Compliant Interface

With the presence of an API and an API [spec](https://github.com/trang/tatoeba-api/wiki/Tatoeba-API-specification-2) most direct SQL queries in the current codebase will become obsolete and the preferred way to do it would be through the API. So a rewrite or a completely new Interface will be the next logical step.

**Deliverables**: A rewrite of the current codebase to use the API instead of SQL queries or a completely new interface in another language, preferably in a javascript framework.

**Prerequisite knowledge**: strong knowledge of CakePHP, familiarity with the API, and familiarity with the current codebase, or strong knowledge of a web framework (django for example) or a javascript framework (angularjs for example).

### Improvements in user interface for end users

Tatoeba now handles several kinds of queries, but more are desired. The translation interface, in particular, needs improvement. Examples of desired types of queries:

* Get all sentences in a given language by a given user that have not been translated into a given language. For example: *Show me all English sentences by user "CK" not yet translated into Japanese.*

* Same as above, but limited to sentences with audio. For example: *Show me all English sentences by "CK" with audio that have not been translated into Japanese.*

* Get all sentences by native speakers of a given language not yet translated into a given language. For example: *Show me all Finnish sentences by native speakers not translated into Hungarian.*

* Get all sentences in a given language with a certain tag not translated into a given language. For example: *Show me all Georgian sentences with the tag "restaurant" not translated into Armenian.*

* Same as above, but limited to sentences by native speakers not translated into a given language. For example: *Show me all Korean sentences by native speakers with the tag "weather" not translated into Japanese.*

* Get all sentences in a given language under a certain length not yet translated into a given language. For example: *Show me all Japanese sentences fewer than 50 characters in length not translated into French.*

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given user.

* Get all sentences by native speakers of a given language that match a given search keyword that have not been translated into a given language. For example: *Show all English sentences with the word "mountain" not translated into Japanese.*

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given user.

**Deliverables:** Implementation of some or all of the above. Project might include additional queries. It would be highly desired to provide a generic way of adding new types of queries.

**Prerequisite knowledge**: PHP, CakePHP.

### Allow users to follow each other

This feature would allow users to keep track of newly created sentences of other users, just like Twitter does. One could be able to get notified of new sentences of users he or she's following, and to browse them. Public and private visibility of who’s following who should be discussed prior to development.

**Deliverables:** a mean to follow one or more users ; a page that displays the sentences of the followed users and allows to browse and search through ; configurable notifications about new sentences of the followed users ; displaying of who’s following who.

**Prerequisite knowledge**: PHP, CakePHP.

### Word requests

This feature would allow users to request example sentences that show the correct usage of a given word or phrase. Contributors could browse lists of 'requested words' and add sentences that include them.

People interested in this idea should consider and discuss possible implementations details prior to development. Typical questions include: 

* What should be the scope of the lists (per-language, per-user…)? 
* How should the lists be maintained? 
* Can we indicate that a 'requested word' now has enough example sentences? If yes, how? 
* What’s the lifecycle of a typical requested word? 
* What if users want to express additional information in their requests, such as the context or sense for the requested word? 
* What about synonyms, inflections… ?

**Deliverables:** a mean to express the need of example sentences of a given word ; a way to easily contribute new sentences that shows example of wanted words

**Prerequisite knowledge**: PHP, CakePHP.

### Show pronunciation in IPA for sentences

IPA stands for "International Phonetic Alphabet" and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g., English). Tatoeba could display IPA pronunciation for each sentence in basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.

**Deliverables:** A mechanism that shows IPA pronunciation for some languages (chosen by the student). This can be done server-side (as a standalone service or part of existing code) or client-side (using JavaScript). Mechanism should allow pre-generating pronunciation descriptions and should provide means to manually edit pronunciation later. Mechanism can rely on 3rd party tool to generate pronunciation descriptions.

**Prerequisite knowledge**: web technology, some web application stack (PHP, Python or CppCMS preferred).

New site & CppCMS
-----------------

Helping Sysko with tatowiki, tatodb. Extending [CppCMS](http://cppcms.com/wikipp/en/page/main). As the new site is still mostly being planned, there are no specific project ideas for the moment. Please ask on the IRC channel for more information. Note that many projects in this category will have an experimental nature, and their scope highly depends on your skills.

Standalone user tools
---------------------

Work on tools such as the following:

* [shtooka recorder](http://a4esl.com/temporary/tatoeba/shtooka/) or [swac-record](http://zmoo.fr/swac-tools/) for recording audio for sentences
* [tatoparser](https://github.com/qdii/tatoeba_parser) for retrieving sentences that match regular expressions
* [katoeba](https://github.com/sadhen/katoeba) 

Create new tools, such as apps for smartphones, for ordinary or advanced contributors.

### Android/iPhone application

iPhone users are about 12%, and Android users about 7% of the site visitors. It might help them immensely if they could use a dedicated application.

**Deliverables**: A smartphone application for easy access to Tatoeba. Examples of features:

* Querying the online Tatoeba site
* Adding sentences
* Translating
* Performing typical corpus maintenance tasks (linking/unlinking sentences, changing sentence language, tagging, etc.)
* Access to wall and sentence comments
* Recording voice
* Offline database access (more difficult!)

Note: It is not expected to implement all of these features during a single GSoC event. Depending on your skills, you might prepare a proposal for a basic set of features (if you don't have much experience in mobile development yet) or a more complex or targeted application (if you do have experience and want to prepare something more feature-complete).

**Prerequisite knowledge**: Java and Android development or iPhone and iOS developement; using web services.

### Streamlined linking of multiple sentences

Where multiple sentences in a source language have the same translation in the target language, make it easy to link those source sentences to the same target translation. Collecting the sentences that are likely to have the same translation could be as simple as presenting sentences in order of creation, since variants of a sentence that vary only in, e.g., the number of the pronoun (where the singular and plural forms of the second person map to the same word in English) are likely to be entered consecutively.

**Prerequisite knowledge**: JavaScript, possibly Java, possibly SQL

### Help bots

Produce bots like [those in Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Bots), to help with maintenance and repetitive tasks such as fixing common mistakes, wrong flag, etc. As on Wikipedia, users that are actually bots should be identified somehow on the website side. Ideally, create a library that could be used as a base to create bots that interact with the Tatoeba website.

Note: this idea would highly benefit from having a real API, which is another project listed here.

**Prerequisite knowledge**: web 

Other ideas
-----------

External services, like [CK's Temporary Tatoeba site](http://a4esl.com/temporary/tatoeba/). Other ideas.

### XMPP Integration for Tatoeba

Use the XMPP communications protocol to integrate the manipulation of sentences, comments, and wall posts, as well as live feeds of latest comments, sentence additions, and wall posts, with services such as pubsub.

**Deliverables**:

* An XEP that outlines the protocol tatoeba would use for all of those operations over XMPP ready to be submitted to the XSF
* An implementation of this XEP in XMPP clients as plugins, poezio and gajim are top priorities.
* An implementation of this XEP server side as a module, prosody is a top priority.

**Prerequisite Knowledge**: XMPP, PubSub, Python, Lua, Familiarity with prosody/gajim/poezio codebases and plugin architecture


### SRS deck generator

Spaced Repetition Systems such as Anki and Mnemosyne are popular tools for learning languages. However, preparing a good SRS deck is a time-consuming task. Therefore, an automated way to generate a deck from a list of sentences (e.g., sentences on a Tatoeba list, sentences tagged by some specific tag, etc.) would help language learners.

**Deliverables**: an application (preferably a web-based one) that would use Tatoeba database (for example in the form of a weekly CSV data dump) to create SRS decks for major flash card applications. Examples of features:

* Generate a simple deck from a Tatoeba list, tag, or search query.
* Generate an N+1-style deck based on user's list of known words and Tatoeba database. (User gives a list of N words that s/he already knows. System chooses a new sentence where exactly one word is unknown, and the rest belong to the already known set.)
* Generated decks have proper internal structure (as for Anki decks: proper field scheme is used to store knowledge, so editing is easy).

**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; knowledge about SRS.

### Browsable graph of sentence links

Given a sentence, display it as a [graph](https://en.wikipedia.org/wiki/Graph_%28data_structure%29) [like this](http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html#rule2) the linked sentences up to a given depth. The main purpose of such a graph is to show users how Tatoeba is structured at a glance. The current interface doesn’t provide such a view, but it’s important that users understand the actual structure of Tatoeba. This idea could be freely extended to a complete interface allowing linking and unlinking with a click, filter by language, edit sentences, or whatever you can think of.

**Deliverables**: a web application or a client-side JavaScript program that provides a graph view of a group of sentences, and allows manipulating them. Code can either operate on database directly or use existing or planned APIs.

Note that this idea can be implemented as part of the current code base in PHP or as an experimental service for a new CppCMS site (preferred).

**Prerequisite knowledge**: PHP, Python or CppCMS.

version at: 09/06/2014, 03:36

This page was designed for students who were interested in coding a [Google Summer of Code 2014](https://www.google-melange.com/gsoc/homepage/google/gsoc2014) project for Tatoeba. While the deadline (March 21, 2014) passed a while ago, and the selected GSoC students have already started work on their projects, the information on this page should still be of general interest to developers. 

GSoC ideas for student projects
===============================

This page lists example ideas for students who would like to take part in [Google Summer of Code](http://www.google-melange.com/) and be mentored by Tatoeba. To quote [GSoC FAQ](http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page#3._What_is_an_Ideas_list):

<blockquote>
<p>An Ideas list should be a list of suggested student projects. This list is meant to introduce contributors to your project's needs and to provide inspiration to would-be student applicants. It is useful to classify each idea as specifically as possible, e.g. "must know Python" or "easier project; good for a student with more limited experience with C++." If your organization plans to provide a proposal template for the students, it would be good to include it on your Ideas list.</p>

<p>Keep in mind that your ideas list should be a starting point for student proposals; we've heard from past mentoring organization participants that some of their best student projects are those that greatly expanded on a proposed idea or were blue-sky proposals not mentioned on the ideas list at all. A link to a bug tracker for your open source organization is NOT an ideas list.</p>

<p>You can check out the <a href="http://community.kde.org/GSoC/2011/Ideas">Ideas list for KDE</a> for Google Summer of Code in 2011 to get an idea of what we’re looking for in an ideas list. </p>
</blockquote>

If you're a student, you're invited to discuss any of these ideas, as well as propose your own. We encourage you to get familiar with the [site](http://tatoeba.org) and the [codebase](main#developers) immediately. Registering on the site, looking at the existing sentences, and adding a few of your own (preferably in the language that you know best) will be a valuable experience for you. 

## Contact

To contact us developers, use one of these:

* Email/Google groups: [Tatoeba's dev mailing list](https://groups.google.com/forum/#!forum/tatoebaproject)
* IRC: [Tatoeba on #Freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
* XMPP: [Tatoeba conference room on chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join)

To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](http://tatoeba.org/wall/index).


About Tatoeba (for students)
----------------------------

Tatoeba is a libre/free database of example sentences translated into many 
languages. Our goal is to create a resource for people studying 
languages—either to learn or research. The database is currently used:

* As a source of example sentences by free dictionaries and language 
learning websites (like Jim Breen’s WWWJDIC; Jim Breen is actually a 
member too):

  * There's a list of free dictionary and language learning websites
 using Tatoeba's corpus maintained by our member CK:
        http://a4esl.com/temporary/tatoeba/links.html

* As a rich resource for language learners: They can find out how to 
use words or how to translate grammatical constructs and idioms.

* For research: example papers include:

  * Research on treebanking Japanese (Francis Bond, 栗林 孝行 [Takayuki 
Kuribayashi], 橋本 力 [Hashimoto Chikara] (2008) HPSGに基づくフリーな日本語ツリー バンクの構築 
[A free Japanese Treebank based on HPSG]. In 14th Annual Meeting of The 
Association for Natural Language Processing, Tokyo),

  * Statistical machine translation (Eric Nichols, Francis Bond, 
Darren Scott Appling and Yuji Matsumoto (2010) Paraphrasing Training 
Data for Statistical Machine Translation. Journal of Natural Language 
Processing, 17(3), pages 101-122)

The main site currently has about 1 million page views and 250 thousand unique visitors monthly, as reported by Google Analytics, and the corpus is growing steadily by 3% or more every month.


Current site
------------

Extending current PHP site: 

* programming in PHP with the CakePHP framework
* shell tools for maintenance (e.g.: better export scripts)
* JavaScript

### Better export scripts

Currently CSV dumps are done weekly. They require the database to be switched into a read-only mode, take 5~10 minutes, and do not contain some important information, such as tag creator and comments. CSV dumps are important for people who cooperate with Tatoeba by creating additional tools, so their quality is vital for healthy collaboration.

**Deliverables**: An database export mechanism that:

  * Dumps all interesting information (everything currently in the data dumps plus modification history, sentence comments, the wall, etc.).
  * Can create incremental dumps, making them faster, which will allow them to be made more frequently.
  * Provides an interface for collaborators to get notifications about new dumps and allows automatic access.
  * (Advanced) Provides a stream of updates via web sockets or a similar mechanism.

**Prerequisite knowledge**: a scripting language (Python preferred), PHP, MySQL.

### Administrative Scripts

The current site has experienced a couple of crashes and instabilities. We would like to increase the number of users who are fully capable of administering it with ease and reliability. In order to ease administration and quickly recover from disaster, a number of scripts covering common administrative tasks are needed.

**Deliverables**:

*Shell scripts* that cover:

 * backup
 * restoring of backups
 * export of data
 * import of data 
 * adding new languages
 * deduplication of sentences
 * indexing of sentences by the search daemon
 * preserving existing translations when typos in the source UI strings are fixed 
 * getting external services up and running
 * updating the production site from the repository
 * deployment on a real server from scratch
 * deployment on a development machine from scratch
 * monitoring the server and logging load and activity
 * other necessary tasks

An *administrative interface* accessible by admins to manually execute any of these tasks on the server with progress bars and statistics. 

**Prerequisite knowledge**: a scripting language (bash, Python, Perl, etc..), possibly familiarity with a build system (ansible, vagrant), and possibly familiarity with setting up and maintaining a monitoring system (newrelic, nagios, cacti, munin)

### Testing Suite

The current website doesn't have any tangible automated tests or any form of continuous integration. Any new code that gets added to the repository can break the existing website and most testing is done manually at this point. In order to make the code future proof and give the users a stable experience, a battery of tests for the main functionality of the website is needed.

**Deliverables**: A number of tests that create a test database and emulate a browser, testing some or all of the functionality that the website currently offers. Refer to this [list](http://en.wiki.tatoeba.org/articles/show/functionality-test-list)

**Prerequisite knowledge**: a scripting language (Python, PHP, etc...), MySQL, and a web testing framework (selenium, or something similar).

### API

The Tatoeba database is used either through the main website interface or through data dumps. Having a real API that can be called through AJAX and return machine-readable results would provide real-time access for external applications.

**Deliverables**: A web application that provides a set of API calls for data stored in the current database. The API should cover all data available through the current web interface, including sentence comments, wall comments, recently-added sentences and top recent contributors.

**Prerequisite knowledge**: a web application language (Python or PHP preferred), MySQL, and a data exchange format such as JSON or XML.

### API Compliant Interface

With the presence of an API and an API [spec](https://github.com/trang/tatoeba-api/wiki/Tatoeba-API-specification-2) most direct SQL queries in the current codebase will become obsolete and the preferred way to do it would be through the API. So a rewrite or a completely new Interface will be the next logical step.

**Deliverables**: A rewrite of the current codebase to use the API instead of SQL queries or a completely new interface in another language, preferably in a javascript framework.

**Prerequisite knowledge**: strong knowledge of CakePHP, familiarity with the API, and familiarity with the current codebase, or strong knowledge of a web framework (django for example) or a javascript framework (angularjs for example).

### Improvements in user interface for end users

Tatoeba now handles several kinds of queries, but more are desired. The translation interface, in particular, needs improvement. Examples of desired types of queries:

* Get all sentences in a given language by a given user that have not been translated into a given language. For example: *Show me all English sentences by user "CK" not yet translated into Japanese.*

* Same as above, but limited to sentences with audio. For example: *Show me all English sentences by "CK" with audio that have not been translated into Japanese.*

* Get all sentences by native speakers of a given language not yet translated into a given language. For example: *Show me all Finnish sentences by native speakers not translated into Hungarian.*

* Get all sentences in a given language with a certain tag not translated into a given language. For example: *Show me all Georgian sentences with the tag "restaurant" not translated into Armenian.*

* Same as above, but limited to sentences by native speakers not translated into a given language. For example: *Show me all Korean sentences by native speakers with the tag "weather" not translated into Japanese.*

* Get all sentences in a given language under a certain length not yet translated into a given language. For example: *Show me all Japanese sentences fewer than 50 characters in length not translated into French.*

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given user.

* Get all sentences by native speakers of a given language that match a given search keyword that have not been translated into a given language. For example: *Show all English sentences with the word "mountain" not translated into Japanese.*

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given user.

**Deliverables:** Implementation of some or all of the above. Project might include additional queries. It would be highly desired to provide a generic way of adding new types of queries.

**Prerequisite knowledge**: PHP, CakePHP.

### Allow users to follow each other

This feature would allow users to keep track of newly created sentences of other users, just like Twitter does. One could be able to get notified of new sentences of users he or she's following, and to browse them. Public and private visibility of who’s following who should be discussed prior to development.

**Deliverables:** a mean to follow one or more users ; a page that displays the sentences of the followed users and allows to browse and search through ; configurable notifications about new sentences of the followed users ; displaying of who’s following who.

**Prerequisite knowledge**: PHP, CakePHP.

### Word requests

This feature would allow users to request example sentences that show the correct usage of a given word or phrase. Contributors could browse lists of 'requested words' and add sentences that include them.

People interested in this idea should consider and discuss possible implementations details prior to development. Typical questions include: 

* What should be the scope of the lists (per-language, per-user…)? 
* How should the lists be maintained? 
* Can we indicate that a 'requested word' now has enough example sentences? If yes, how? 
* What’s the lifecycle of a typical requested word? 
* What if users want to express additional information in their requests, such as the context or sense for the requested word? 
* What about synonyms, inflections… ?

**Deliverables:** a mean to express the need of example sentences of a given word ; a way to easily contribute new sentences that shows example of wanted words

**Prerequisite knowledge**: PHP, CakePHP.

### Show pronunciation in IPA for sentences

IPA stands for "International Phonetic Alphabet" and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g., English). Tatoeba could display IPA pronunciation for each sentence in basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.

**Deliverables:** A mechanism that shows IPA pronunciation for some languages (chosen by the student). This can be done server-side (as a standalone service or part of existing code) or client-side (using JavaScript). Mechanism should allow pre-generating pronunciation descriptions and should provide means to manually edit pronunciation later. Mechanism can rely on 3rd party tool to generate pronunciation descriptions.

**Prerequisite knowledge**: web technology, some web application stack (PHP, Python or CppCMS preferred).

New site & CppCMS
-----------------

Helping Sysko with tatowiki, tatodb. Extending [CppCMS](http://cppcms.com/wikipp/en/page/main). As the new site is still mostly being planned, there are no specific project ideas for the moment. Please ask on the IRC channel for more information. Note that many projects in this category will have an experimental nature, and their scope highly depends on your skills.

Standalone user tools
---------------------

Work on tools such as the following:

* [shtooka recorder](http://a4esl.com/temporary/tatoeba/shtooka/) or [swac-record](http://zmoo.fr/swac-tools/) for recording audio for sentences
* [tatoparser](https://github.com/qdii/tatoeba_parser) for retrieving sentences that match regular expressions
* [katoeba](https://github.com/sadhen/katoeba) 

Create new tools, such as apps for smartphones, for ordinary or advanced contributors.

### Android/iPhone application

iPhone users are about 12%, and Android users about 7% of the site visitors. It might help them immensely if they could use a dedicated application.

**Deliverables**: A smartphone application for easy access to Tatoeba. Examples of features:

* Querying the online Tatoeba site
* Adding sentences
* Translating
* Performing typical corpus maintenance tasks (linking/unlinking sentences, changing sentence language, tagging, etc.)
* Access to wall and sentence comments
* Recording voice
* Offline database access (more difficult!)

Note: It is not expected to implement all of these features during a single GSoC event. Depending on your skills, you might prepare a proposal for a basic set of features (if you don't have much experience in mobile development yet) or a more complex or targeted application (if you do have experience and want to prepare something more feature-complete).

**Prerequisite knowledge**: Java and Android development or iPhone and iOS developement; using web services.

### Streamlined linking of multiple sentences

Where multiple sentences in a source language have the same translation in the target language, make it easy to link those source sentences to the same target translation. Collecting the sentences that are likely to have the same translation could be as simple as presenting sentences in order of creation, since variants of a sentence that vary only in, e.g., the number of the pronoun (where the singular and plural forms of the second person map to the same word in English) are likely to be entered consecutively.

**Prerequisite knowledge**: JavaScript, possibly Java, possibly SQL

### Help bots

Produce bots like [those in Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Bots), to help with maintenance and repetitive tasks such as fixing common mistakes, wrong flag, etc. As on Wikipedia, users that are actually bots should be identified somehow on the website side. Ideally, create a library that could be used as a base to create bots that interact with the Tatoeba website.

Note: this idea would highly benefit from having a real API, which is another project listed here.

**Prerequisite knowledge**: web 

Other ideas
-----------

External services, like [CK's Temporary Tatoeba site](http://a4esl.com/temporary/tatoeba/). Other ideas.

### XMPP Integration for Tatoeba

Use the XMPP communications protocol to integrate the manipulation of sentences, comments, and wall posts, as well as live feeds of latest comments, sentence additions, and wall posts, with services such as pubsub.

**Deliverables**:

* An XEP that outlines the protocol tatoeba would use for all of those operations over XMPP ready to be submitted to the XSF
* An implementation of this XEP in XMPP clients as plugins, poezio and gajim are top priorities.
* An implementation of this XEP server side as a module, prosody is a top priority.

**Prerequisite Knowledge**: XMPP, PubSub, Python, Lua, Familiarity with prosody/gajim/poezio codebases and plugin architecture


### SRS deck generator

Spaced Repetition Systems such as Anki and Mnemosyne are popular tools for learning languages. However, preparing a good SRS deck is a time-consuming task. Therefore, an automated way to generate a deck from a list of sentences (e.g., sentences on a Tatoeba list, sentences tagged by some specific tag, etc.) would help language learners.

**Deliverables**: an application (preferably a web-based one) that would use Tatoeba database (for example in the form of a weekly CSV data dump) to create SRS decks for major flash card applications. Examples of features:

* Generate a simple deck from a Tatoeba list, tag, or search query.
* Generate an N+1-style deck based on user's list of known words and Tatoeba database. (User gives a list of N words that s/he already knows. System chooses a new sentence where exactly one word is unknown, and the rest belong to the already known set.)
* Generated decks have proper internal structure (as for Anki decks: proper field scheme is used to store knowledge, so editing is easy).

**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; knowledge about SRS.

### Browsable graph of sentence links

Given a sentence, display it as a [graph](https://en.wikipedia.org/wiki/Graph_%28data_structure%29) [like this](http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html#rule2) the linked sentences up to a given depth. The main purpose of such a graph is to show users how Tatoeba is structured at a glance. The current interface doesn’t provide such a view, but it’s important that users understand the actual structure of Tatoeba. This idea could be freely extended to a complete interface allowing linking and unlinking with a click, filter by language, edit sentences, or whatever you can think of.

**Deliverables**: a web application or a client-side JavaScript program that provides a graph view of a group of sentences, and allows manipulating them. Code can either operate on database directly or use existing or planned APIs.

Note that this idea can be implemented as part of the current code base in PHP or as an experimental service for a new CppCMS site (preferred).

**Prerequisite knowledge**: PHP, Python or CppCMS.

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.