Version at: 23/02/2016, 15:15 vs. version at: 01/03/2016, 12:15
11# GSoC 2016 Project ideas
2
3----
4
5**Attention:** Unfortunately Tatoeba was not accepted to Google Summer of Code this year. This page is here only for reference.
6
7----
28
39This page lists project ideas for students who would like to take part in [Google Summer of Code 2016](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).
410
511
612## About Tatoeba
713
814[Tatoeba](http://tatoeba.org) is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.
915
1016You can browse the [blog](http://blog.tatoeba.org/) or the [wiki](http://en.wiki.tatoeba.org/) for more information about the project.
1117
1218
1319
1420## For students: How to get started
1521
1622If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**](https://developers.google.com/open-source/gsoc/timeline). **Until that date, Tatoeba is not officially part of GSoC 2016.**
1723
1824Of course this should not stop you from getting started on a project ahead of time. If you choose to do so, here's how you get started.
1925
20261. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that **you understand how the program works**. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.
2127
22282. **Spend time using Tatoeba.** You need to have a good understanding of the current functionalities. Note that we have a [dev website](https://dev.tatoeba.org) where you can test anything you want without being afraid of polluting the prod website.
2329
24303. If your project involves implementing something for the current version of Tatoeba, we expect you to show us that you understand our development process and our tools. The main way to do so is to actually try to contribute some code. **Please follow our [guide for new developers](guide-for-new-developers)**.
2531
26324. If your project will not affect Tatoeba's code itself, do still read the guide, but stop at the [Get in touch with us](http://en.wiki.tatoeba.org/articles/show/guide-for-new-developers#get-in-touch-with-us) part, and let us know your project idea.
2733
28345. Start preparing your [GSoC proposal](http://en.flossmanuals.net/GSoCStudentGuide/ch008_writing-a-proposal/). You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. **Please read our [requirements regarding GSoC proposals](gsoc_application_requirements)**.
2935
30365. Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but **you are in no way limited to these ideas**.
3137
3238
3339
3440## Ideas
3541
3642### 1. Mobile-friendly user interface
3743
3844Around 40% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.
3945
4046If you are interested in this project we recommend that you read our discussion from last year:
4147
4248* [GSoC 2015 - Mobile friendly user interface](https://groups.google.com/forum/#!topic/tatoebaproject/ssK6N3T6in4)
4349
4450### 2. Word requests
4551
4652Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?
4753
4854Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.
4955
5056
5157### 3. Achievement system
5258
5359The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.
5460
5561This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next.
5662
5763This system can also make contributing more engaging for the more advanced contributors.
5864
5965
6066### 4. Improvement of communication tools
6167
6268The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.
6369
6470The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.
6571
6672The goal of this project is:
6773
68741. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
69752. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.
7076
7177
7278### 5. Permissions management
7379
7480The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete others’ sentences while other contributors cannot.
7581
7682The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.
7783
7884Here are example of things that we cannot do at the moment, and that could be part of the project:
7985
8086* Disallow a user to add new sentences, but still allow them to translate sentences.
8187* Restrict the languages in which a user can contribute.
8288* Disallow a user from posting comments only on the Wall, but not on sentences.
8389
8490
8591### 6. Audio
8692
8793Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.
8894
8995Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.
9096
9197* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](https://github.com/Tatoeba/tatoeba2/issues/547)).
9298* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
9399* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
94100* It would also be nice if users could record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))
95101
96102The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.
97103
98104
99105### 7. Better export
100106
101107Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:
102108
103109* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
104110* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
105111* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
106112* Third parties must wait a week to get new data.
107113* Third parties must do some preliminary work to restructure the data the way they need it.
108114* Probably other things.
109115
110116We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?
111117
112118
113119### 8. Pytoeba
114120
115121There has been for a long time the desire to develop a new version of Tatoeba, with another framework and with the main goal to improve its architecture, maintainability and performances. During GSoC 2014, one of our students worked such a project. The proposal was to develop the new version of Tatoeba based on Django, a Python framework. Hence the project name "Pytoeba".
116122
117123Pytoeba is a very ambitious project and to this date, it has not yet reached enough maturity to be introduced into production. We still do hope however to bring it to life.
118124
119125
120126## Mentors
121127
122128All the students will be mentored by BOTH **gillux** and **Trang**.
123129
124130#### gillux
125131
126132* [gillux](https://tatoeba.org/eng/user/profile/gillux) on Tatoeba
127133* [jiru](https://github.com/jiru) on GitHub
128134* gillux on IRC
129135
130136#### Trang
131137
132138* [TRANG](https://tatoeba.org/eng/user/profile/TRANG) on Tatoeba
133139* [trang](https://github.com/trang) on GitHub
134140* Trang on IRC
135141
136142
137143## Contact
138144
139145#### Google group
140146
141147Our Google group is called [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject). This is your main entry point to get in touch with us in the scope of Google Summer of Code.
142148
143149#### IRC / XMPP
144150
145151To interact with us in real time, you are welcome to join our IRC channel: [#tatoeba](irc://irc.freenode.net/tatoeba) on freenode. Note that we are more likely to be online during the weekend than during weekdays.
146152
147153If you don't want to install a IRC client, you can use the [Webchat](http://webchat.freenode.net?channels=tatoeba).
148154
149155In case IRC is not your type of protocol, you can instead join our Jabber room on [tatoeba@chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join).
150156
151157#### Tatoeba Wall
152158
153159The [Wall](http://tatoeba.org/wall/index) is the place where Tatoeba's community discuss things, ask questions, and exchange ideas. We usually read all the messages on the Wall, so you could also get in touch with us from there.
154160
155161It could happen however that your message goes unnoticed because it got buried behind some passionate discussion, therefore we recommend that you use the Google group at first.
diff view generated by jsdifflib

Version at: 23/02/2016, 15:15

# GSoC 2016 Project ideas

This page lists project ideas for students who would like to take part in [Google Summer of Code 2016](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).


## About Tatoeba

[Tatoeba](http://tatoeba.org) is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.

You can browse the [blog](http://blog.tatoeba.org/) or the [wiki](http://en.wiki.tatoeba.org/) for more information about the project.



## For students: How to get started

If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**](https://developers.google.com/open-source/gsoc/timeline). **Until that date, Tatoeba is not officially part of GSoC 2016.**

Of course this should not stop you from getting started on a project ahead of time. If you choose to do so, here's how you get started.

1. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that **you understand how the program works**. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.

2. **Spend time using Tatoeba.** You need to have a good understanding of the current functionalities. Note that we have a [dev website](https://dev.tatoeba.org) where you can test anything you want without being afraid of polluting the prod website.

3. If your project involves implementing something for the current version of Tatoeba, we expect you to show us that you understand our development process and our tools. The main way to do so is to actually try to contribute some code. **Please follow our [guide for new developers](guide-for-new-developers)**.

4. If your project will not affect Tatoeba's code itself, do still read the guide, but stop at the [Get in touch with us](http://en.wiki.tatoeba.org/articles/show/guide-for-new-developers#get-in-touch-with-us) part, and let us know your project idea.

5. Start preparing your [GSoC proposal](http://en.flossmanuals.net/GSoCStudentGuide/ch008_writing-a-proposal/). You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. **Please read our [requirements regarding GSoC proposals](gsoc_application_requirements)**.

5. Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but **you are in no way limited to these ideas**.



## Ideas

### 1. Mobile-friendly user interface

Around 40% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.

If you are interested in this project we recommend that you read our discussion from last year:

* [GSoC 2015 - Mobile friendly user interface](https://groups.google.com/forum/#!topic/tatoebaproject/ssK6N3T6in4)

### 2. Word requests

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.


### 3. Achievement system

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next. 

This system can also make contributing more engaging for the more advanced contributors.


### 4. Improvement of communication tools

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.


### 5. Permissions management

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete others’ sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.


### 6. Audio

Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.

Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.

* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](https://github.com/Tatoeba/tatoeba2/issues/547)).
* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
* It would also be nice if users could record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.


### 7. Better export

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?


### 8. Pytoeba

There has been for a long time the desire to develop a new version of Tatoeba, with another framework and with the main goal to improve its architecture, maintainability and performances. During GSoC 2014, one of our students worked such a project. The proposal was to develop the new version of Tatoeba based on Django, a Python framework. Hence the project name "Pytoeba".

Pytoeba is a very ambitious project and to this date, it has not yet reached enough maturity to be introduced into production. We still do hope however to bring it to life.


## Mentors

All the students will be mentored by BOTH **gillux** and **Trang**.

#### gillux

* [gillux](https://tatoeba.org/eng/user/profile/gillux) on Tatoeba
* [jiru](https://github.com/jiru) on GitHub
* gillux on IRC

#### Trang

* [TRANG](https://tatoeba.org/eng/user/profile/TRANG) on Tatoeba
* [trang](https://github.com/trang) on GitHub
* Trang on IRC


## Contact

#### Google group

Our Google group is called [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject). This is your main entry point to get in touch with us in the scope of Google Summer of Code.

#### IRC / XMPP

To interact with us in real time, you are welcome to join our IRC channel: [#tatoeba](irc://irc.freenode.net/tatoeba) on freenode. Note that we are more likely to be online during the weekend than during weekdays.

If you don't want to install a IRC client, you can use the [Webchat](http://webchat.freenode.net?channels=tatoeba).

In case IRC is not your type of protocol, you can instead join our Jabber room on [tatoeba@chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join).

#### Tatoeba Wall

The [Wall](http://tatoeba.org/wall/index) is the place where Tatoeba's community discuss things, ask questions, and exchange ideas. We usually read all the messages on the Wall, so you could also get in touch with us from there.

It could happen however that your message goes unnoticed because it got buried behind some passionate discussion, therefore we recommend that you use the Google group at first.

version at: 01/03/2016, 12:15

# GSoC 2016 Project ideas

----

**Attention:** Unfortunately Tatoeba was not accepted to Google Summer of Code this year. This page is here only for reference.

----

This page lists project ideas for students who would like to take part in [Google Summer of Code 2016](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).


## About Tatoeba

[Tatoeba](http://tatoeba.org) is a platform that aims to build a large **database of sentences** translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because **tatoeba** means **for example** in Japanese.

You can browse the [blog](http://blog.tatoeba.org/) or the [wiki](http://en.wiki.tatoeba.org/) for more information about the project.



## For students: How to get started

If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2016. The list of accepted mentoring organizations will be published on [**February 29**](https://developers.google.com/open-source/gsoc/timeline). **Until that date, Tatoeba is not officially part of GSoC 2016.**

Of course this should not stop you from getting started on a project ahead of time. If you choose to do so, here's how you get started.

1. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that **you understand how the program works**. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.

2. **Spend time using Tatoeba.** You need to have a good understanding of the current functionalities. Note that we have a [dev website](https://dev.tatoeba.org) where you can test anything you want without being afraid of polluting the prod website.

3. If your project involves implementing something for the current version of Tatoeba, we expect you to show us that you understand our development process and our tools. The main way to do so is to actually try to contribute some code. **Please follow our [guide for new developers](guide-for-new-developers)**.

4. If your project will not affect Tatoeba's code itself, do still read the guide, but stop at the [Get in touch with us](http://en.wiki.tatoeba.org/articles/show/guide-for-new-developers#get-in-touch-with-us) part, and let us know your project idea.

5. Start preparing your [GSoC proposal](http://en.flossmanuals.net/GSoCStudentGuide/ch008_writing-a-proposal/). You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba. **Please read our [requirements regarding GSoC proposals](gsoc_application_requirements)**.

5. Last but not least, remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but **you are in no way limited to these ideas**.



## Ideas

### 1. Mobile-friendly user interface

Around 40% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.

If you are interested in this project we recommend that you read our discussion from last year:

* [GSoC 2015 - Mobile friendly user interface](https://groups.google.com/forum/#!topic/tatoebaproject/ssK6N3T6in4)

### 2. Word requests

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

Tatoeba currently doesn't have any feature to support this situation. Our users would like to be able to easily create word requests, where they can submit a word in a certain language to request that other contributors create sentences around this word.


### 3. Achievement system

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next. 

This system can also make contributing more engaging for the more advanced contributors.


### 4. Improvement of communication tools

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.


### 5. Permissions management

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete others’ sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.


### 6. Audio

Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.

Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.

* There is no easy way to know (from the website) who is the author of an audio file, not when it was contributed (cf. [Github issue #547](https://github.com/Tatoeba/tatoeba2/issues/547)).
* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* It is a bit tedious to update and maintain the audio. Contributors have to follow [a certain procedure](contribute-audio), then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
* It would also be nice if users could record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.


### 7. Better export

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?


### 8. Pytoeba

There has been for a long time the desire to develop a new version of Tatoeba, with another framework and with the main goal to improve its architecture, maintainability and performances. During GSoC 2014, one of our students worked such a project. The proposal was to develop the new version of Tatoeba based on Django, a Python framework. Hence the project name "Pytoeba".

Pytoeba is a very ambitious project and to this date, it has not yet reached enough maturity to be introduced into production. We still do hope however to bring it to life.


## Mentors

All the students will be mentored by BOTH **gillux** and **Trang**.

#### gillux

* [gillux](https://tatoeba.org/eng/user/profile/gillux) on Tatoeba
* [jiru](https://github.com/jiru) on GitHub
* gillux on IRC

#### Trang

* [TRANG](https://tatoeba.org/eng/user/profile/TRANG) on Tatoeba
* [trang](https://github.com/trang) on GitHub
* Trang on IRC


## Contact

#### Google group

Our Google group is called [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject). This is your main entry point to get in touch with us in the scope of Google Summer of Code.

#### IRC / XMPP

To interact with us in real time, you are welcome to join our IRC channel: [#tatoeba](irc://irc.freenode.net/tatoeba) on freenode. Note that we are more likely to be online during the weekend than during weekdays.

If you don't want to install a IRC client, you can use the [Webchat](http://webchat.freenode.net?channels=tatoeba).

In case IRC is not your type of protocol, you can instead join our Jabber room on [tatoeba@chat.tatoeba.org](xmpp:tatoeba@chat.tatoeba.org?join).

#### Tatoeba Wall

The [Wall](http://tatoeba.org/wall/index) is the place where Tatoeba's community discuss things, ask questions, and exchange ideas. We usually read all the messages on the Wall, so you could also get in touch with us from there.

It could happen however that your message goes unnoticed because it got buried behind some passionate discussion, therefore we recommend that you use the Google group at first.

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.