Version at: 20/01/2016, 18:00 vs. version at: 26/01/2016, 19:22
11# GSoC 2016 Project ideas
22
33This page lists project ideas for students who would like to take part in [Google Summer of Code 2016](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).
44
55## Warning
66
77This page is still a work in progress.
88
99## About Tatoeba
1010
1111[Tatoeba](http://tatoeba.org) is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.
1212
1313You can browse the [blog](http://blog.tatoeba.org/) or the [wiki](http://en.wiki.tatoeba.org/) for more information about the project.
1414
1515
1616## Contact
1717
1818* Google group: [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject)
1919* IRC: [#tatoeba on freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
2020* XMPP: [Tatoeba conference room on chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join)
2121
2222To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](http://tatoeba.org/wall/index).
23
24## How to submit ideas
25
26If you would like to submit an idea and do not have access to the wiki, please [contact us](#contact) and send us the information below.
27If you have access to the wiki, simply edit this page and add the information in the [Ideas](#ideas) section.
28
29<pre>
30### Project title
31
32#### Description
33Brief description of the project. If you have already specified a lot of things about the project, do not write all the details here. Create a separate wiki page for it and only write a summary here, with a link to that wiki page.
34
35#### Deliverables
36What is the student expected to deliver at the end of the summer.
37
38#### Prerequisite knowledge
39Technical knowledge required to be able to complete the project. If you do not know what are the prerequisite knowledge for the project you are proposing, you can leave this blank, someone else will complete it.
40
41#### Possible mentors
42People from the team that may be able to mentor that idea. You can leave this section blank if you’re a student. Please only add a mentor’s name if you are that person or if the person explicitly agrees.
43</pre>
4423
4524
4625
4726## A note for students
4827
4928If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**](https://developers.google.com/open-source/gsoc/timeline). Until that date, Tatoeba is not officially part of GSoC 2016.
5029
5130Of course this should not stop you from getting started on a project ahead of time. If you do so, we recommend you the following.
5231
53321. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that you understand how the program works. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.
54332. If the project you are interested in involves implementing code in the current version of Tatoeba, [install Tatoeba on your machine](https://github.com/Tatoeba/tatoeba2#installing-tatoeba), explore the code, experiment with it.
55344. Start preparing your [proposal](http://en.flossmanuals.net/GSoCStudentGuide/ch008_writing-a-proposal/). You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. We have certain [requirements regarding GSoC proposals](gsoc_application_requirements).
56353. If you would like to contribute code to get familiar with the project before GSoC, but don't know how to get started, you can read [this guide](guide-for-new-developers).
5736
5837Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but you are in no way limited to these ideas.
5938
6039## Ideas
6140
6241### Mobile friendly user interface
6342
6443#### Description
6544Around 40% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.
6645
6746
6847#### Deliverables
6948Implementation in Tatoeba's source code.
7049
7150#### Prerequisite knowledge
7251PHP, HTML, CSS
7352
7453#### Possible mentors
7554Trang
7655
7756### Word requests
7857
7958#### Description
8059
8160Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?
8261
8362Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.
8463
8564
8665#### Deliverables
8766
8867Implementation in Tatoeba's source code.
8968
9069#### Prerequisite knowledge
9170
9271CakePHP
9372
9473#### Possible mentors
9574
9675gillux, Trang
9776
9877
9978### Achievement system
10079
10180#### Description
10281
10382The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.
10483
10584This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next.
10685
10786This system can also make contributing more engaging for the more advanced contributors.
10887
10988
11089#### Deliverables
11190
11291Implementation in Tatoeba's source code.
11392
11493#### Prerequisite knowledge
11594CakePHP, MySQL, knowledge about gamification
11695
11796#### Possible mentors
11897
11998Trang
12099
121100
122101### Improvement of communication tools
123102
124103#### Description
125104
126105The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.
127106
128107The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.
129108
130109The goal of this project is:
131110
1321111. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
1331122. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.
134113
135114#### Deliverables
136115
137116Implementation in Tatoeba's source code.
138117
139118#### Prerequisite knowledge
140119CakePHP, MySQL
141120
142121#### Possible mentors
143122gillux
144123
145124
146125### Permissions management
147126
148127#### Description
149128
150129The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete sentences while other contributors cannot.
151130
152131The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.
153132
154133Here are example of things that we cannot do at the moment, and that could be part of the project:
155134
156135* Disallow a user to add new sentences, but still allow them to translate sentences.
157136* Restrict the languages in which a user can contribute.
158137* Disallow a user from posting comments only on the Wall, but not on sentences.
159138
160139#### Deliverables
161140
162141Implementation in Tatoeba's source code.
163142
164143#### Prerequisite knowledge
165144CakePHP, MySQL
166145
167146#### Possible mentors
168147Trang
169148
170149
171150### Audio
172151
173152#### Description
174153
175154Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.
176155
177156Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.
178157
179158* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](https://github.com/Tatoeba/tatoeba2/issues/547)).
180159* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
181160* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
182161* It would also be nice if users could record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))
183162
184163The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.
185164
186165#### Deliverables
187166
188167Implementation in Tatoeba's source code.
189168
190169#### Prerequisite knowledge
191170CakePHP
192171
193172#### Possible mentors
194173Trang
195174
196175
197176
198177### Better export
199178
200179#### Description
201180
202181Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:
203182
204183* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
205184* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
206185* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
207186* Third parties must wait a week to get new data.
208187* Third parties must do some preliminary work to restructure the data the way they need it.
209188* Probably other things.
210189
211190We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?
212191
213192#### Deliverables
214193
215194?
216195
217196#### Prerequisite knowledge
218197
219198?
220199
221200#### Possible mentors
222201
223202gillux, Trang
diff view generated by jsdifflib

Version at: 20/01/2016, 18:00

# GSoC 2016 Project ideas

This page lists project ideas for students who would like to take part in [Google Summer of Code 2016](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).

## Warning

This page is still a work in progress.

## About Tatoeba

[Tatoeba](http://tatoeba.org) is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.

You can browse the [blog](http://blog.tatoeba.org/) or the [wiki](http://en.wiki.tatoeba.org/) for more information about the project.


## Contact

* Google group: [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject)
* IRC: [#tatoeba on freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
* XMPP: [Tatoeba conference room on chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join)

To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](http://tatoeba.org/wall/index).

## How to submit ideas

If you would like to submit an idea and do not have access to the wiki, please [contact us](#contact) and send us the information below.
If you have access to the wiki, simply edit this page and add the information in the [Ideas](#ideas) section.

<pre>
### Project title

#### Description
Brief description of the project. If you have already specified a lot of things about the project, do not write all the details here. Create a separate wiki page for it and only write a summary here, with a link to that wiki page.

#### Deliverables
What is the student expected to deliver at the end of the summer.

#### Prerequisite knowledge
Technical knowledge required to be able to complete the project. If you do not know what are the prerequisite knowledge for the project you are proposing, you can leave this blank, someone else will complete it.

#### Possible mentors
People from the team that may be able to mentor that idea. You can leave this section blank if you’re a student. Please only add a mentor’s name if you are that person or if the person explicitly agrees.
</pre>



## A note for students

If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**](https://developers.google.com/open-source/gsoc/timeline). Until that date, Tatoeba is not officially part of GSoC 2016.

Of course this should not stop you from getting started on a project ahead of time. If you do so, we recommend you the following.

1. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that you understand how the program works. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.
2. If the project you are interested in involves implementing code in the current version of Tatoeba, [install Tatoeba on your machine](https://github.com/Tatoeba/tatoeba2#installing-tatoeba), explore the code, experiment with it.
4. Start preparing your [proposal](http://en.flossmanuals.net/GSoCStudentGuide/ch008_writing-a-proposal/). You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. We have certain [requirements regarding GSoC proposals](gsoc_application_requirements).
3. If you would like to contribute code to get familiar with the project before GSoC, but don't know how to get started, you can read [this guide](guide-for-new-developers).

Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but you are in no way limited to these ideas.

## Ideas

### Mobile friendly user interface

#### Description
Around 40% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.


#### Deliverables
Implementation in Tatoeba's source code.

#### Prerequisite knowledge
PHP, HTML, CSS

#### Possible mentors
Trang

### Word requests

#### Description

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.


#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

CakePHP

#### Possible mentors

gillux, Trang


### Achievement system

#### Description

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next. 

This system can also make contributing more engaging for the more advanced contributors.


#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL, knowledge about gamification

#### Possible mentors

Trang


### Improvement of communication tools

#### Description

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL

#### Possible mentors
gillux


### Permissions management

#### Description

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL

#### Possible mentors
Trang


### Audio

#### Description

Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.

Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.

* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](https://github.com/Tatoeba/tatoeba2/issues/547)).
* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
* It would also be nice if users could record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP

#### Possible mentors
Trang



### Better export

#### Description

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?

#### Deliverables

?

#### Prerequisite knowledge

?

#### Possible mentors

gillux, Trang

version at: 26/01/2016, 19:22

# GSoC 2016 Project ideas

This page lists project ideas for students who would like to take part in [Google Summer of Code 2016](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).

## Warning

This page is still a work in progress.

## About Tatoeba

[Tatoeba](http://tatoeba.org) is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.

You can browse the [blog](http://blog.tatoeba.org/) or the [wiki](http://en.wiki.tatoeba.org/) for more information about the project.


## Contact

* Google group: [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject)
* IRC: [#tatoeba on freenode](irc://irc.freenode.net/tatoeba), [Webchat](http://webchat.freenode.net?channels=tatoeba)
* XMPP: [Tatoeba conference room on chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join)

To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the [Tatoeba Wall page](http://tatoeba.org/wall/index).



## A note for students

If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**](https://developers.google.com/open-source/gsoc/timeline). Until that date, Tatoeba is not officially part of GSoC 2016.

Of course this should not stop you from getting started on a project ahead of time. If you do so, we recommend you the following.

1. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that you understand how the program works. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.
2. If the project you are interested in involves implementing code in the current version of Tatoeba, [install Tatoeba on your machine](https://github.com/Tatoeba/tatoeba2#installing-tatoeba), explore the code, experiment with it.
4. Start preparing your [proposal](http://en.flossmanuals.net/GSoCStudentGuide/ch008_writing-a-proposal/). You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. We have certain [requirements regarding GSoC proposals](gsoc_application_requirements).
3. If you would like to contribute code to get familiar with the project before GSoC, but don't know how to get started, you can read [this guide](guide-for-new-developers).

Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but you are in no way limited to these ideas.

## Ideas

### Mobile friendly user interface

#### Description
Around 40% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.


#### Deliverables
Implementation in Tatoeba's source code.

#### Prerequisite knowledge
PHP, HTML, CSS

#### Possible mentors
Trang

### Word requests

#### Description

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.


#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge

CakePHP

#### Possible mentors

gillux, Trang


### Achievement system

#### Description

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next. 

This system can also make contributing more engaging for the more advanced contributors.


#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL, knowledge about gamification

#### Possible mentors

Trang


### Improvement of communication tools

#### Description

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL

#### Possible mentors
gillux


### Permissions management

#### Description

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP, MySQL

#### Possible mentors
Trang


### Audio

#### Description

Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.

Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.

* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](https://github.com/Tatoeba/tatoeba2/issues/547)).
* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
* It would also be nice if users could record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.

#### Deliverables

Implementation in Tatoeba's source code.

#### Prerequisite knowledge
CakePHP

#### Possible mentors
Trang



### Better export

#### Description

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?

#### Deliverables

?

#### Prerequisite knowledge

?

#### Possible mentors

gillux, Trang

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.