Version at: 29/03/2013, 18:20 vs. version at: 29/03/2013, 18:30
11GSoC ideas for student projects
22===============================
33
44This page lists example ideas for students who would like to take part in Google Summer of Code and be mentored by Tatoeba. To quote [GSoC FAQ](http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page#3._What_is_an_Ideas_list):
55
66<blockquote>
77<p>An Ideas list should be a list of suggested student projects. This list is meant to introduce contributors to your project's needs and to provide inspiration to would-be student applicants. It is useful to classify each idea as specifically as possible, e.g. "must know Python" or "easier project; good for a student with more limited experience with C++." If your organization plans to provide a proposal template for the students, it would be good to include it on your Ideas list.</p>
88
99<p>Keep in mind that your ideas list should be a starting point for student proposals; we've heard from past mentoring organization participants that some of their best student projects are those that greatly expanded on a proposed idea or were blue-sky proposals not mentioned on the ideas list at all. A link to a bug tracker for your open source organization is NOT an ideas list.</p>
1010
1111<p>You can check out the <a href="http://community.kde.org/GSoC/2011/Ideas">Ideas list for KDE</a> for Google Summer of Code in 2011 to get an idea of what we’re looking for in an ideas list. </p>
1212</blockquote>
1313
1414If you're a student, you're invited to discuss any of these ideas, as well as propose your own. To contact us, use one of these:
1515
1616* [Tatoeba Wall page](http://tatoeba.org/wall/index)
1717* Email/Google groups: [Tatoeba GSoC mailing list](https://groups.google.com/forum/?fromgroups=#!forum/tatoeba-gsoc)
1818* IRC: [Tatoeba on #Freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
1919* XMPP: [Tatoeba conference room on chat.jabberfr.org](xmpp:tatoeba@chat.jabberfr.org?join)
2020
2121
2222Current site
2323------------
2424
2525Extending current PHP site: programming in PHP, shell tools for maintenance (e.g.: better export scripts?), JavaScript?
2626
2727### Better export scripts
2828
2929Currently CSV dumps are done weekly. They require the database to be switched into a read-only mode, take 5~10 minutes and do not contain some important information, like tag creator, comments etc. CSV dumps are important for people who cooperate with Tatoeba by creating additional tools, so their quality is vital for a healthy collaboration.
3030
3131**Deliverables**: An database export mechanism that:
3232
3333 * Dumps all interesting information (everything that's currently in the data dumps plus modification history, sentence comments, the wall etc.)
3434 * Can create an incremental dump (faster dumps will allow making them more often)
3535 * Provides an interface for collaborators to get notifications about new dumps and allows automatic access.
3636 * (Advanced) Provides a stream of updates in form of web sockets or a similar mechanism.
3737
3838**Prerequisite knowledge**: a scripting language (Python preferred), PHP, MySQL.
3939
4040### API
4141
4242Tatoeba database is used either through the main website interface or through data dumps. Having an API (like, something than can be called through AJAX calls) would allow providing real-time access for external applications.
4343
4444**Deliverables**: A web application that provides a set of API calls for data stored in current database. API should cover all data available through the current web interface, including sentence comments, wall comments, recently-added sentences and top recent contributors.
4545
4646**Prerequisite knowledge**: a web application language (Python or PHP preferred), MySQL.
4747
4848### Improvements in user interface for end users
4949
5050Tatoeba now handles several kinds of queries, but more is desired. Especially the translation interface needs improvements. Examples of desired types of queries:
5151
5252* Get all sentences in a given language by a given username not yet translated into a given language.
5353
5454 * For example: Show me all English sentences by CK not yet translated into Japanese.
5555
5656* Same as above, but limited to sentences with audio.
5757
5858 * For example: Show me all English sentences by CK with audio not yet translated into Japanese.
5959
6060* Get all sentences by native speakers of a given language not yet translated into my own native language.
6161
6262 * For example: Show me all English sentences by native speakers not yet translated into Japanese.
6363
6464* Get all sentences in a given language with a certain tag not yet translated into a given language.
6565
6666 * For example: Show me all English sentences with the tag “restaurant.”
6767
6868* Same as above, but limited to sentences by native speakers not yet translated into a given language.
6969
7070 * For example: Show me all English sentences by native speakers with the tag "weather" not yet translated into Japanese.
7171
7272* Get all sentence in a given language under a certain length not yet translated into a given language.
7373
7474 * For example: Show me all Japanese sentences less than 50 characters that aren't yet translated into English.
7575
7676* Same as above, but limited to native speaker sentences.
7777
7878* Same as above, but limited to sentences by a given username.
7979
8080* Get all sentences by native speakers of a given language that match a given search keyword that aren't yet translated into a given language.
8181
8282 * For example: Show all English sentences with the word "mountain" that aren't yet translated into Japanese.
8383
8484* Same as above, but limited to native speaker sentences.
8585
8686* Same as above, but limited to sentences by a given username.
8787
8888**Deliverables:** Implementation of some (all?) of the above. Project might include additional queries. It would be highly desired to provide a generic way of adding new types of queries.
8989
9090**Prerequisite knowledge**: PHP, CakePHP.
9191
92### Show pronunciation in IPA for sentences
93
94IPA stands for “International Phonetic Alphabet” and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g. English). Tatoeba could display IPA pronunciation for each sentence, basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.
95
96**Deliverables:** A mechanism that shows IPA pronunciation for some languages (chosen by student). This can be done server-side (as a standalone service or part of existing code) or client-side (using JavaScript). Mechanism should allow pre-generating pronunciation descriptions and should provide means to manually edit pronunciation later.
97
98**Prerequisite knowledge**: web technology. some web application stack (PHP, Python or CppCMS preferred).
99
92100New site & CppCMS
93101-----------------
94102
95103Helping Sysko with tatowiki, tatodb. Extending [CppCMS](http://cppcms.com/wikipp/en/page/main). As the new site is still mostly in plans, there are no specific project ideas for the moment. Please ask on the IRC channel for more information. Note that many projects in this category will have an experimental nature, and their scope highly depends on your skills.
96104
97105Standalone user tools
98106---------------------
99107
100108Work on [shtooka recorder](http://a4esl.com/temporary/tatoeba/shtooka/) (or swac-record), [tatoparser](https://github.com/qdii/tatoeba_parser), [katoeba](https://github.com/sadhen/katoeba) and similar tools. Create new tools for advanced contributors and common users, like apps for smartphones.
101109
102110### Android/iPhone application
103111
104112iPhone users are about 12%, and Android users about 7% of the site visitors. It might immensely help them if they could use a dedicated application
105113
106114**Deliverables**: A smartphone application for easy access to Tatoeba. Example features:
107115* Querying the online Tatoeba site
108116* Adding sentences
109117* Translating
110118* Performing typical corpus maintenance tasks
111119* Wall and sentence comments
112120* Recording voice
113121* Offline database access
114122
115123Note: It is not expected to implement all of these features during a single GSoC event. Depending on your skills you might prepare a proposal for a basic set of features (if you don't have much experience in mobile development yet) or a more complex or targeted application (if you do have experience and want to prepare something more feature-complete).
116124
117125**Prerequisite knowledge**: Java and Android development or iPhone and iOS developement; using web services.
118126
119127### Streamlined linking of multiple sentences
120128
121129Where multiple sentences in a source language have the same translation in the target language, make it easy to link those source sentences to the same target translation. Collecting the sentences that are likely to have the same translation could be as simple as presenting sentences in order of creation, since variants of a sentence that vary only in, e.g., the number of the pronoun (where the singular and plural forms of the second person map to the same word in English) are likely to be entered consecutively.
122130
123Skills: JavaScript, possibly Java, possibly SQL
131**Prerequisite knowledge**: JavaScript, possibly Java, possibly SQL
124132
125133### Help bots
126134
127135Just like [bots in wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Bots), have bots to help doing maintenance and repetitive tasks such as fixing common mistakes, wrong flag etc. Like on wikipedia, users that are actually bots should be identified somehow on the website side. Ideally, create a library to interact with the Tatoeba website, that could be used as a base to create bots.
128136
129137External web applications
130138-------------------------
131139
132140[CK's Temporary Tatoeba site](http://a4esl.com/temporary/tatoeba/) and similar pages.
133141
134142### SRS deck generator
135143
136144Spaced Repetition Systems such as Anki or Mnemosyne are a popular tool for learning languages. However, preparing a good SRS deck is a time consuming task. Therefore an automated way to generate a deck from a list of sentences (e.g. sentences on a Tatoeba list, sentences tagged by some specific tag etc.) would help language learners.
137145
138146**Deliverables**: an application (preferably a web-based one) that would use Tatoeba database (for example in form of a weekly CSV data dump) to create SRS decks for major flash card applications. Example features:
139147* Generate a simple deck from a Tatoeba list, tag, search query.
140148* Generate an N+1-style deck based on user's list of known words and Tatoeba database. (User gives a list of N words that he already knows. System chooses a new sentence where exactly one word is unknown, and the rest belong to the already known set. Therefore, N+1).
141149* Generated decks have proper internal structure (e.g. for Anki decks: proper field scheme is used to store knowledge, so that editing is easy).
142150
143**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; basic knowledge about how SRS works.
151**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; knowledge about SRS.
144152
145153### Browsable graph of sentences links
146154
147155Given a sentence, display as a [graph](https://en.wikipedia.org/wiki/Graph_%28data_structure%29) [like this](http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html#rule2) the linked sentences up to a given depth. The main purpose of such a graph is to show users how tatoeba is structured at a glance. Because the current interface doesn’t provide such a view whereas it’s important that the users understand the actual structure of Tatoeba. This idea could be freely extended to a complete interface allowing linking and unlinking with a click, filter by language, edit sentences, or whatever you can think of.
148156
149Other ideas
150-----------
157**Deliverables**: a web application or a client-side JavaScript program that provides a graph view of a group of sentences, and allows manipulating them. Code can either use database directly or use existing or planned APIs.
151158
152### Show pronunciation in IPA for sentences
159Note that this idea can also be implemented as part of current code base in PHP or as an experimental service for new CppCMS site.
153160
154IPA stands for “International Phonetic Alphabet” and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g. English). Tatoeba could display IPA pronunciation for each sentence, basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.
155
161**Prerequisite knowledge**: any web stack, however integration into current code is highly preferred.
diff view generated by jsdifflib

Version at: 29/03/2013, 18:20

GSoC ideas for student projects
===============================

This page lists example ideas for students who would like to take part in Google Summer of Code and be mentored by Tatoeba. To quote [GSoC FAQ](http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page#3._What_is_an_Ideas_list):

<blockquote>
<p>An Ideas list should be a list of suggested student projects. This list is meant to introduce contributors to your project's needs and to provide inspiration to would-be student applicants. It is useful to classify each idea as specifically as possible, e.g. "must know Python" or "easier project; good for a student with more limited experience with C++." If your organization plans to provide a proposal template for the students, it would be good to include it on your Ideas list.</p>

<p>Keep in mind that your ideas list should be a starting point for student proposals; we've heard from past mentoring organization participants that some of their best student projects are those that greatly expanded on a proposed idea or were blue-sky proposals not mentioned on the ideas list at all. A link to a bug tracker for your open source organization is NOT an ideas list.</p>

<p>You can check out the <a href="http://community.kde.org/GSoC/2011/Ideas">Ideas list for KDE</a> for Google Summer of Code in 2011 to get an idea of what we’re looking for in an ideas list. </p>
</blockquote>

If you're a student, you're invited to discuss any of these ideas, as well as propose your own. To contact us, use one of these:

* [Tatoeba Wall page](http://tatoeba.org/wall/index)
* Email/Google groups: [Tatoeba GSoC mailing list](https://groups.google.com/forum/?fromgroups=#!forum/tatoeba-gsoc)
* IRC: [Tatoeba on #Freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
* XMPP: [Tatoeba conference room on chat.jabberfr.org](xmpp:tatoeba@chat.jabberfr.org?join)


Current site
------------

Extending current PHP site: programming in PHP, shell tools for maintenance (e.g.: better export scripts?), JavaScript?

### Better export scripts

Currently CSV dumps are done weekly. They require the database to be switched into a read-only mode, take 5~10 minutes and do not contain some important information, like tag creator, comments etc. CSV dumps are important for people who cooperate with Tatoeba by creating additional tools, so their quality is vital for a healthy collaboration.

**Deliverables**: An database export mechanism that:

  * Dumps all interesting information (everything that's currently in the data dumps plus modification history, sentence comments, the wall etc.)
  * Can create an incremental dump (faster dumps will allow making them more often)
  * Provides an interface for collaborators to get notifications about new dumps and allows automatic access.
  * (Advanced) Provides a stream of updates in form of web sockets or a similar mechanism.

**Prerequisite knowledge**: a scripting language (Python preferred), PHP, MySQL.

### API

Tatoeba database is used either through the main website interface or through data dumps. Having an API (like, something than can be called through AJAX calls) would allow providing real-time access for external applications.

**Deliverables**: A web application that provides a set of API calls for data stored in current database. API should cover all data available through the current web interface, including sentence comments, wall comments, recently-added sentences and top recent contributors.

**Prerequisite knowledge**: a web application language (Python or PHP preferred), MySQL.

### Improvements in user interface for end users

Tatoeba now handles several kinds of queries, but more is desired. Especially the translation interface needs improvements. Examples of desired types of queries:

* Get all sentences in a given language by a given username not yet translated into a given language.

 * For example: Show me all English sentences by CK not yet translated into Japanese.

* Same as above, but limited to sentences with audio.

 * For example: Show me all English sentences by CK with audio not yet translated into Japanese.

* Get all sentences by native speakers of a given language not yet translated into my own native language.

 * For example: Show me all English sentences by native speakers not yet translated into Japanese.

* Get all sentences in a given language with a certain tag not yet translated into a given language.

 * For example: Show me all English sentences with the tag “restaurant.”

* Same as above, but limited to sentences by native speakers not yet translated into a given language.

 * For example: Show me all English sentences by native speakers with the tag "weather" not yet translated into Japanese.

* Get all sentence in a given language under a certain length not yet translated into a given language.

 * For example: Show me all Japanese sentences less than 50 characters that aren't yet translated into English.

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given username.

* Get all sentences by native speakers of a given language that match a given search keyword that aren't yet translated into a given language.

 * For example: Show all English sentences with the word "mountain" that aren't yet translated into Japanese.

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given username.

**Deliverables:** Implementation of some (all?) of the above. Project might include additional queries. It would be highly desired to provide a generic way of adding new types of queries.

**Prerequisite knowledge**: PHP, CakePHP.

New site & CppCMS
-----------------

Helping Sysko with tatowiki, tatodb. Extending [CppCMS](http://cppcms.com/wikipp/en/page/main). As the new site is still mostly in plans, there are no specific project ideas for the moment. Please ask on the IRC channel for more information. Note that many projects in this category will have an experimental nature, and their scope highly depends on your skills.

Standalone user tools
---------------------

Work on [shtooka recorder](http://a4esl.com/temporary/tatoeba/shtooka/) (or swac-record), [tatoparser](https://github.com/qdii/tatoeba_parser), [katoeba](https://github.com/sadhen/katoeba) and similar tools. Create new tools for advanced contributors and common users, like apps for smartphones.

### Android/iPhone application

iPhone users are about 12%, and Android users about 7% of the site visitors. It might immensely help them if they could use a dedicated application

**Deliverables**: A smartphone application for easy access to Tatoeba. Example features:
* Querying the online Tatoeba site
* Adding sentences
* Translating
* Performing typical corpus maintenance tasks
* Wall and sentence comments
* Recording voice
* Offline database access

Note: It is not expected to implement all of these features during a single GSoC event. Depending on your skills you might prepare a proposal for a basic set of features (if you don't have much experience in mobile development yet) or a more complex or targeted application (if you do have experience and want to prepare something more feature-complete).

**Prerequisite knowledge**: Java and Android development or iPhone and iOS developement; using web services.

### Streamlined linking of multiple sentences

Where multiple sentences in a source language have the same translation in the target language, make it easy to link those source sentences to the same target translation. Collecting the sentences that are likely to have the same translation could be as simple as presenting sentences in order of creation, since variants of a sentence that vary only in, e.g., the number of the pronoun (where the singular and plural forms of the second person map to the same word in English) are likely to be entered consecutively.

Skills: JavaScript, possibly Java, possibly SQL

### Help bots

Just like [bots in wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Bots), have bots to help doing maintenance and repetitive tasks such as fixing common mistakes, wrong flag etc. Like on wikipedia, users that are actually bots should be identified somehow on the website side. Ideally, create a library to interact with the Tatoeba website, that could be used as a base to create bots.

External web applications
-------------------------

[CK's Temporary Tatoeba site](http://a4esl.com/temporary/tatoeba/) and similar pages.

### SRS deck generator

Spaced Repetition Systems such as Anki or Mnemosyne are a popular tool for learning languages. However, preparing a good SRS deck is a time consuming task. Therefore an automated way to generate a deck from a list of sentences (e.g. sentences on a Tatoeba list, sentences tagged by some specific tag etc.) would help language learners.

**Deliverables**: an application (preferably a web-based one) that would use Tatoeba database (for example in form of a weekly CSV data dump) to create SRS decks for major flash card applications. Example features:
* Generate a simple deck from a Tatoeba list, tag, search query.
* Generate an N+1-style deck based on user's list of known words and Tatoeba database. (User gives a list of N words that he already knows. System chooses a new sentence where exactly one word is unknown, and the rest belong to the already known set. Therefore, N+1).
* Generated decks have proper internal structure (e.g. for Anki decks: proper field scheme is used to store knowledge, so that editing is easy).

**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; basic knowledge about how SRS works.

### Browsable graph of sentences links

Given a sentence, display as a [graph](https://en.wikipedia.org/wiki/Graph_%28data_structure%29) [like this](http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html#rule2) the linked sentences up to a given depth. The main purpose of such a graph is to show users how tatoeba is structured at a glance. Because the current interface doesn’t provide such a view whereas it’s important that the users understand the actual structure of Tatoeba. This idea could be freely extended to a complete interface allowing linking and unlinking with a click, filter by language, edit sentences, or whatever you can think of.

Other ideas
-----------

### Show pronunciation in IPA for sentences

IPA stands for “International Phonetic Alphabet” and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g. English). Tatoeba could display IPA pronunciation for each sentence, basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.

version at: 29/03/2013, 18:30

GSoC ideas for student projects
===============================

This page lists example ideas for students who would like to take part in Google Summer of Code and be mentored by Tatoeba. To quote [GSoC FAQ](http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page#3._What_is_an_Ideas_list):

<blockquote>
<p>An Ideas list should be a list of suggested student projects. This list is meant to introduce contributors to your project's needs and to provide inspiration to would-be student applicants. It is useful to classify each idea as specifically as possible, e.g. "must know Python" or "easier project; good for a student with more limited experience with C++." If your organization plans to provide a proposal template for the students, it would be good to include it on your Ideas list.</p>

<p>Keep in mind that your ideas list should be a starting point for student proposals; we've heard from past mentoring organization participants that some of their best student projects are those that greatly expanded on a proposed idea or were blue-sky proposals not mentioned on the ideas list at all. A link to a bug tracker for your open source organization is NOT an ideas list.</p>

<p>You can check out the <a href="http://community.kde.org/GSoC/2011/Ideas">Ideas list for KDE</a> for Google Summer of Code in 2011 to get an idea of what we’re looking for in an ideas list. </p>
</blockquote>

If you're a student, you're invited to discuss any of these ideas, as well as propose your own. To contact us, use one of these:

* [Tatoeba Wall page](http://tatoeba.org/wall/index)
* Email/Google groups: [Tatoeba GSoC mailing list](https://groups.google.com/forum/?fromgroups=#!forum/tatoeba-gsoc)
* IRC: [Tatoeba on #Freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
* XMPP: [Tatoeba conference room on chat.jabberfr.org](xmpp:tatoeba@chat.jabberfr.org?join)


Current site
------------

Extending current PHP site: programming in PHP, shell tools for maintenance (e.g.: better export scripts?), JavaScript?

### Better export scripts

Currently CSV dumps are done weekly. They require the database to be switched into a read-only mode, take 5~10 minutes and do not contain some important information, like tag creator, comments etc. CSV dumps are important for people who cooperate with Tatoeba by creating additional tools, so their quality is vital for a healthy collaboration.

**Deliverables**: An database export mechanism that:

  * Dumps all interesting information (everything that's currently in the data dumps plus modification history, sentence comments, the wall etc.)
  * Can create an incremental dump (faster dumps will allow making them more often)
  * Provides an interface for collaborators to get notifications about new dumps and allows automatic access.
  * (Advanced) Provides a stream of updates in form of web sockets or a similar mechanism.

**Prerequisite knowledge**: a scripting language (Python preferred), PHP, MySQL.

### API

Tatoeba database is used either through the main website interface or through data dumps. Having an API (like, something than can be called through AJAX calls) would allow providing real-time access for external applications.

**Deliverables**: A web application that provides a set of API calls for data stored in current database. API should cover all data available through the current web interface, including sentence comments, wall comments, recently-added sentences and top recent contributors.

**Prerequisite knowledge**: a web application language (Python or PHP preferred), MySQL.

### Improvements in user interface for end users

Tatoeba now handles several kinds of queries, but more is desired. Especially the translation interface needs improvements. Examples of desired types of queries:

* Get all sentences in a given language by a given username not yet translated into a given language.

 * For example: Show me all English sentences by CK not yet translated into Japanese.

* Same as above, but limited to sentences with audio.

 * For example: Show me all English sentences by CK with audio not yet translated into Japanese.

* Get all sentences by native speakers of a given language not yet translated into my own native language.

 * For example: Show me all English sentences by native speakers not yet translated into Japanese.

* Get all sentences in a given language with a certain tag not yet translated into a given language.

 * For example: Show me all English sentences with the tag “restaurant.”

* Same as above, but limited to sentences by native speakers not yet translated into a given language.

 * For example: Show me all English sentences by native speakers with the tag "weather" not yet translated into Japanese.

* Get all sentence in a given language under a certain length not yet translated into a given language.

 * For example: Show me all Japanese sentences less than 50 characters that aren't yet translated into English.

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given username.

* Get all sentences by native speakers of a given language that match a given search keyword that aren't yet translated into a given language.

 * For example: Show all English sentences with the word "mountain" that aren't yet translated into Japanese.

* Same as above, but limited to native speaker sentences.

* Same as above, but limited to sentences by a given username.

**Deliverables:** Implementation of some (all?) of the above. Project might include additional queries. It would be highly desired to provide a generic way of adding new types of queries.

**Prerequisite knowledge**: PHP, CakePHP.

### Show pronunciation in IPA for sentences

IPA stands for “International Phonetic Alphabet” and is used to describe pronunciation of human languages in an unambiguous way. As such, it helps learning languages whose pronunciation rules are complex (e.g. English). Tatoeba could display IPA pronunciation for each sentence, basically the same way it currently displays pronunciation for Japanese using kana. One possible way of performing the task is to use an external library or application to prepare IPA annotations. For example, [eSpeak](http://espeak.sourceforge.net/) seems to be able to handle several popular languages and has an IPA converter.

**Deliverables:** A mechanism that shows IPA pronunciation for some languages (chosen by student). This can be done server-side (as a standalone service or part of existing code) or client-side (using JavaScript). Mechanism should allow pre-generating pronunciation descriptions and should provide means to manually edit pronunciation later.

**Prerequisite knowledge**: web technology. some web application stack (PHP, Python or CppCMS preferred).

New site & CppCMS
-----------------

Helping Sysko with tatowiki, tatodb. Extending [CppCMS](http://cppcms.com/wikipp/en/page/main). As the new site is still mostly in plans, there are no specific project ideas for the moment. Please ask on the IRC channel for more information. Note that many projects in this category will have an experimental nature, and their scope highly depends on your skills.

Standalone user tools
---------------------

Work on [shtooka recorder](http://a4esl.com/temporary/tatoeba/shtooka/) (or swac-record), [tatoparser](https://github.com/qdii/tatoeba_parser), [katoeba](https://github.com/sadhen/katoeba) and similar tools. Create new tools for advanced contributors and common users, like apps for smartphones.

### Android/iPhone application

iPhone users are about 12%, and Android users about 7% of the site visitors. It might immensely help them if they could use a dedicated application

**Deliverables**: A smartphone application for easy access to Tatoeba. Example features:
* Querying the online Tatoeba site
* Adding sentences
* Translating
* Performing typical corpus maintenance tasks
* Wall and sentence comments
* Recording voice
* Offline database access

Note: It is not expected to implement all of these features during a single GSoC event. Depending on your skills you might prepare a proposal for a basic set of features (if you don't have much experience in mobile development yet) or a more complex or targeted application (if you do have experience and want to prepare something more feature-complete).

**Prerequisite knowledge**: Java and Android development or iPhone and iOS developement; using web services.

### Streamlined linking of multiple sentences

Where multiple sentences in a source language have the same translation in the target language, make it easy to link those source sentences to the same target translation. Collecting the sentences that are likely to have the same translation could be as simple as presenting sentences in order of creation, since variants of a sentence that vary only in, e.g., the number of the pronoun (where the singular and plural forms of the second person map to the same word in English) are likely to be entered consecutively.

**Prerequisite knowledge**: JavaScript, possibly Java, possibly SQL

### Help bots

Just like [bots in wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Bots), have bots to help doing maintenance and repetitive tasks such as fixing common mistakes, wrong flag etc. Like on wikipedia, users that are actually bots should be identified somehow on the website side. Ideally, create a library to interact with the Tatoeba website, that could be used as a base to create bots.

External web applications
-------------------------

[CK's Temporary Tatoeba site](http://a4esl.com/temporary/tatoeba/) and similar pages.

### SRS deck generator

Spaced Repetition Systems such as Anki or Mnemosyne are a popular tool for learning languages. However, preparing a good SRS deck is a time consuming task. Therefore an automated way to generate a deck from a list of sentences (e.g. sentences on a Tatoeba list, sentences tagged by some specific tag etc.) would help language learners.

**Deliverables**: an application (preferably a web-based one) that would use Tatoeba database (for example in form of a weekly CSV data dump) to create SRS decks for major flash card applications. Example features:
* Generate a simple deck from a Tatoeba list, tag, search query.
* Generate an N+1-style deck based on user's list of known words and Tatoeba database. (User gives a list of N words that he already knows. System chooses a new sentence where exactly one word is unknown, and the rest belong to the already known set. Therefore, N+1).
* Generated decks have proper internal structure (e.g. for Anki decks: proper field scheme is used to store knowledge, so that editing is easy).

**Prerequisite knowledge**: any web stack, however Django or CppCMS are prefered; Python or C++; knowledge about SRS.

### Browsable graph of sentences links

Given a sentence, display as a [graph](https://en.wikipedia.org/wiki/Graph_%28data_structure%29) [like this](http://blog.tatoeba.org/2010/02/how-to-be-good-contributor-in-tatoeba.html#rule2) the linked sentences up to a given depth. The main purpose of such a graph is to show users how tatoeba is structured at a glance. Because the current interface doesn’t provide such a view whereas it’s important that the users understand the actual structure of Tatoeba. This idea could be freely extended to a complete interface allowing linking and unlinking with a click, filter by language, edit sentences, or whatever you can think of.

**Deliverables**: a web application or a client-side JavaScript program that provides a graph view of a group of sentences, and allows manipulating them. Code can either use database directly or use existing or planned APIs.

Note that this idea can also be implemented as part of current code base in PHP or as an experimental service for new CppCMS site.

**Prerequisite knowledge**: any web stack, however integration into current code is highly preferred.

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.