Version at: 17/01/2016, 16:38

# GSoC 2016 Project ideas

This page lists project ideas for students who would like to take part in [Google Summer of Code 2016]( and be mentored by [Tatoeba](

## Warning

This page is still a work in progress.

## About Tatoeba

[Tatoeba]( is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.

You can browse the [blog]( or the [wiki]( for more information about the project.

## Contact

* Google group: [tatoebaproject](!forum/tatoebaproject)
* IRC: [#tatoeba on freenode](irc://, [Webchat](
* XMPP: [Tatoeba conference room on](

To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](

## How to submit ideas

If you would like to submit an idea and do not have access to the wiki, please [contact us](#contact) and send us the information below.
If you have access to the wiki, simply edit this page and add the information in the [Ideas](#ideas) section.

### Project title

#### Description
Brief description of the project. If you have already specified a lot of things about the project, do not write all the details here. Create a separate wiki page for it and only write a summary here, with a link to that wiki page.

#### Deliverables
What is the student expected to deliver at the end of the summer.

#### Prerequisite knowledge
Technical knowledge required to be able to complete the project. If you do not know what are the prerequisite knowledge for the project you are proposing, you can leave this blank, someone else will complete it.

#### Possible mentors
People from the team that may be able to mentor that idea. You can leave this section blank if you’re a student. Please only add a mentor’s name if you are that person or if the person explicitly agrees.

## A note for students

If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**]( Until that date, Tatoeba is not officially part of GSoC 2016.

Of course this should not stop you from getting started on a project ahead of time. If you do so, we recommend you the following.

1. Make sure that you have read the [GSoC FAQ]( and that you understand how the program works. Please check the [calendar]( for the various deadlines.
2. If the project you are interested in involves implementing code in the current version of Tatoeba, [install Tatoeba on your machine](, explore the code, experiment with it.
4. Start preparing your [proposal]( You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. We have certain [requirements regarding GSoC proposals](gsoc_application_requirements).
3. If you would like to contribute code to get familiar with the project before GSoC, but don't know how to get started, you can read [this guide](guide-for-new-developers).

Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but you are in no way limited to these ideas.

## Ideas

### Mobile friendly user interface

#### Description
Around 30% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.

#### Deliverables
Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Word requests

#### Description

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge


#### Possible mentors

gillux, Trang

### Achievement system

#### Description

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next. 

This system can also make contributing more engaging for the more advanced contributors.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL, knowledge about gamification

#### Possible mentors


### Improvement of communication tools

#### Description

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Permissions management

#### Description

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Audio

#### Description

Tatoeba provides [audio]( for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.

Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.

* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](
* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
* It would also be nice if users could record audio directly through the web page (see this [proof of concept](

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Better exports files

#### Description

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads]( page of the website. Third parties can reuse this data in their projects. It is however not always easy for them to do so. We are indeed exporting all the possible data and whoever reuses this data needs to do some preliminary work to extract only what they need and restructure it in the way they need it.

We would love to see more projects reusing our data, but this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?

#### Deliverables


#### Prerequisite knowledge


#### Possible mentors

gillux, Trang

version at: 17/01/2016, 18:33

# GSoC 2016 Project ideas

This page lists project ideas for students who would like to take part in [Google Summer of Code 2016]( and be mentored by [Tatoeba](

## Warning

This page is still a work in progress.

## About Tatoeba

[Tatoeba]( is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.

You can browse the [blog]( or the [wiki]( for more information about the project.

## Contact

* Google group: [tatoebaproject](!forum/tatoebaproject)
* IRC: [#tatoeba on freenode](irc://, [Webchat](
* XMPP: [Tatoeba conference room on](

To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](

## How to submit ideas

If you would like to submit an idea and do not have access to the wiki, please [contact us](#contact) and send us the information below.
If you have access to the wiki, simply edit this page and add the information in the [Ideas](#ideas) section.

### Project title

#### Description
Brief description of the project. If you have already specified a lot of things about the project, do not write all the details here. Create a separate wiki page for it and only write a summary here, with a link to that wiki page.

#### Deliverables
What is the student expected to deliver at the end of the summer.

#### Prerequisite knowledge
Technical knowledge required to be able to complete the project. If you do not know what are the prerequisite knowledge for the project you are proposing, you can leave this blank, someone else will complete it.

#### Possible mentors
People from the team that may be able to mentor that idea. You can leave this section blank if you’re a student. Please only add a mentor’s name if you are that person or if the person explicitly agrees.

## A note for students

If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**]( Until that date, Tatoeba is not officially part of GSoC 2016.

Of course this should not stop you from getting started on a project ahead of time. If you do so, we recommend you the following.

1. Make sure that you have read the [GSoC FAQ]( and that you understand how the program works. Please check the [calendar]( for the various deadlines.
2. If the project you are interested in involves implementing code in the current version of Tatoeba, [install Tatoeba on your machine](, explore the code, experiment with it.
4. Start preparing your [proposal]( You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. We have certain [requirements regarding GSoC proposals](gsoc_application_requirements).
3. If you would like to contribute code to get familiar with the project before GSoC, but don't know how to get started, you can read [this guide](guide-for-new-developers).

Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but you are in no way limited to these ideas.

## Ideas

### Mobile friendly user interface

#### Description
Around 30% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.

#### Deliverables
Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Word requests

#### Description

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge


#### Possible mentors

gillux, Trang

### Achievement system

#### Description

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next. 

This system can also make contributing more engaging for the more advanced contributors.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL, knowledge about gamification

#### Possible mentors


### Improvement of communication tools

#### Description

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Permissions management

#### Description

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Audio

#### Description

Tatoeba provides [audio]( for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.

Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.

* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](
* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
* It would also be nice if users could record audio directly through the web page (see this [proof of concept](

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

#### Possible mentors

### Better export

#### Description

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads]( page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?

#### Deliverables


#### Prerequisite knowledge


#### Possible mentors

gillux, Trang


The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.