Version at: 02/02/2017, 02:38 vs. version at: 02/02/2017, 13:02
11# GSoC 2017 Project ideas
22
3----
4
5This page is currently in draft.
6
7----
83
94This page lists project ideas for students who would like to take part in [Google Summer of Code 2017](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).
105
116Note that at this stage Google has not yet chosen which organizations will participate to GSoC 2017. The list of accepted mentoring organizations will be published on [**February 27**](https://developers.google.com/open-source/gsoc/timeline). **Until that date, Tatoeba is not officially part of GSoC 2017.**
127
138
149
1510## About Tatoeba
1611
1712Tatoeba is a large database of sentences and translations. Its content is ever-growing and results from the voluntary contributions of thousands of members.
1813
1914Tatoeba provides a tool for you to see examples of how words are used in the context of a sentence. You specify words that interest you, and it returns sentences containing these words with their translations in the desired languages. The name Tatoeba (for example in Japanese) captures this concept.
2015
2116Source: [https://tatoeba.org/eng/about](https://tatoeba.org/eng/about)
2217
2318
2419
2520## How to get started
2621
2722You are a student and are interested in participating to GSoC with Tatoeba mentoring you? Here's how you can get started.
2823
29241. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that **you understand how the program works**. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.
3025
31262. **Spend time using Tatoeba.** You need to have a good understanding of the current functionalities. Note that we have a [dev website](https://dev.tatoeba.org) where you can test anything you want without being afraid of polluting the prod website.
3227
33283. We'll expect you to show us that you understand our development process and our tools. The best way to do this is to actually **contribute some code**, by fixing a small bug or implement a small enhancement. For this, follow our [guide for new developers](https://github.com/Tatoeba/tatoeba2/wiki/Joining-the-dev-team).
3429
35304. **Read our [requirements regarding GSoC proposals](gsoc_application_requirements).** Start thinking about what you would write in your proposal, and if you need any information, ask us!
3631
3732
3833
3934## Ideas
4035
4136Remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but **you are in no way limited to these ideas**.
4237
4338
4439### Sentences wanted
4540
4641Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?
4742
4843To address this, we made it possible for users to create vocabulary lists. When they add a vocabulary item for which no sentence exists, this item is listed on a page for ["Sentences wanted"](http://tatoeba.org/eng/vocabulary/add_sentences). From this page, contributors can browse vocabylary items with less than 10 sentences, and create sentences for these vocabulary items.
4944
5045This feature still needs a lot of improvement. For instance:
5146
5247* There is no way to filter out or remove "spam" vocabulary items.
5348* There is no system to bump up more demanded vocabulary items.
5449* The sentences linked to the vocabulary items contain only an exact match of the vocabulary.
5550
5651
5752### Achievement system
5853
5954The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.
6055
6156This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next.
6257
6358This system can also make contributing more engaging for the more advanced contributors.
6459
6560
6661### Improvement of communication tools
6762
6863The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.
6964
7065The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.
7166
7267The goal of this project is:
7368
74691. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
75702. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.
7671
7772
7873### Permissions management
7974
8075The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete others’ sentences while other contributors cannot.
8176
8277The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.
8378
8479Here are example of things that we cannot do at the moment, and that could be part of the project:
8580
8681* Disallow a user to add new sentences, but still allow them to translate sentences.
8782* Restrict the languages in which a user can contribute.
8883* Disallow a user from posting comments only on the Wall, but not on sentences.
8984
9085
9186### Audio
9287
9388Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, but due to the fact that audio was initially not at the core of the project, the process of contributing audio is a bit complicated.
9489
9590Audio was still a great addition and Tatoeba has received more and more audio contributions over the years. However the audio content lacks many features.
9691
9792For instance:
9893
9994* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
10095* Contributors cannot record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))
10196
10297The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.
10398
10499
105100### Better export
106101
107102Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:
108103
109104* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
110105* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
111106* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
112107* Third parties must wait a week to get new data.
113108* Third parties must do some preliminary work to restructure the data the way they need it.
114109* Probably other things.
115110
116111We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?
117112
118113
119114### App using Tatoeba's data
120115
121116As mentioned in the "Better exports" idea above, Tatoeba shares its data and we are always happy to see projects reusing our data. Do you have a nice idea of an app that you could build from it? This can be a GSoC project as well.
122117
123118Just one thing: make sure you check this [list of projects that uses our corpus](http://a4esl.org/temporary/tatoeba/links.html). Maybe someone else already had the idea before you. So try to find the gaps. Make something innovative!
124119
125120Note that this project idea is very tied to the "Better exports" idea, except it tackles the problem from a more concrete angle. Since you will be reusing our data, you will experience real situations where you can see how we can improve the way we share our data. You will be in a better position to find out, or helps us find out, what we could do to make it easier for you (and other people like you) to get started with their projects.
126121
127122
128123### Quality
129124
130125As a collaborative project that is open for anyone to join, one of the challenges that Tatoeba faces constantly is to provide data of good quality. Not all Tatoeba contributors are highly skilled in the language(s) they contribute in, and therefore contributions are not always good: they may contain spelling mistakes or grammatical mistakes, they may not sound natural, the translations may be inaccurate on just plain wrong.
131126
132127Although Tatoeba has some mechanisms to manage quality, these mechanisms are not optimal. Users still need to make extra efforts to figure out when they can really rely on a sentence or translation.
133128
134129What can we improve in our current system, to provide sentences and translations of higher quality? How can we assess the quality of a sentence or of a translation, so that language learners or third party tools can easily filter out sentences of bad quality, or of uncertain quality?
135130
136131
137132
138133## Mentors
139134
140135#### gillux
141136
142137* Tatoeba: [gillux](https://tatoeba.org/eng/user/profile/gillux)
143138* GitHub: [jiru](https://github.com/jiru)
144139
145140#### halfdan
146141
147142* Tatoeba: [halfdan](https://tatoeba.org/eng/user/profile/gillux)
148143* GitHub: [halfdan](https://github.com/halfdan)
149144
150145#### Trang
151146
152147* Tatoeba: [Trang](https://tatoeba.org/eng/user/profile/Trang)
153148* GitHub: [trang](https://github.com/trang)
154149
155150#### zachleigh
156151
157152* Tatoeba: [zachleigh](https://tatoeba.org/eng/user/profile/zachleigh)
158153* GitHub: [zachleigh](https://github.com/zachleigh)
159154
160155
161156## Contact
162157
163158#### Google group
164159
165160Our Google group is called [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject). This is your main entry point to get in touch with us in the scope of Google Summer of Code.
166161
167162#### Gitter
168163
169164To interact with us in real time, you are welcome to join our [Gitter chatroom](https://gitter.im/Tatoeba/tatoeba2). We may not be online when you drop by, but feel free to leave a message nonetheless.
170165
171166#### Tatoeba Wall
172167
173168The [Wall](http://tatoeba.org/wall/index) is the place where Tatoeba's community discuss things, ask questions, and exchange ideas. We usually read all the messages on the Wall, so you could also get in touch with us from there.
174169
175170It could happen however that your message goes unnoticed because it got buried behind some passionate discussion, therefore we recommend that you use the Google group at first.
diff view generated by jsdifflib

Version at: 02/02/2017, 02:38

# GSoC 2017 Project ideas

----

This page is currently in draft.

----

This page lists project ideas for students who would like to take part in [Google Summer of Code 2017](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).

Note that at this stage Google has not yet chosen which organizations will participate to GSoC 2017. The list of accepted mentoring organizations will be published on [**February 27**](https://developers.google.com/open-source/gsoc/timeline). **Until that date, Tatoeba is not officially part of GSoC 2017.**



## About Tatoeba

Tatoeba is a large database of sentences and translations. Its content is ever-growing and results from the voluntary contributions of thousands of members.

Tatoeba provides a tool for you to see examples of how words are used in the context of a sentence. You specify words that interest you, and it returns sentences containing these words with their translations in the desired languages. The name Tatoeba (for example in Japanese) captures this concept.

Source: [https://tatoeba.org/eng/about](https://tatoeba.org/eng/about)



## How to get started

You are a student and are interested in participating to GSoC with Tatoeba mentoring you? Here's how you can get started.

1. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that **you understand how the program works**. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.

2. **Spend time using Tatoeba.** You need to have a good understanding of the current functionalities. Note that we have a [dev website](https://dev.tatoeba.org) where you can test anything you want without being afraid of polluting the prod website.

3. We'll expect you to show us that you understand our development process and our tools. The best way to do this is to actually **contribute some code**, by fixing a small bug or implement a small enhancement. For this, follow our [guide for new developers](https://github.com/Tatoeba/tatoeba2/wiki/Joining-the-dev-team).

4. **Read our [requirements regarding GSoC proposals](gsoc_application_requirements).** Start thinking about what you would write in your proposal, and if you need any information, ask us!



## Ideas

Remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but **you are in no way limited to these ideas**.


### Sentences wanted

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

To address this, we made it possible for users to create vocabulary lists. When they add a vocabulary item for which no sentence exists, this item is listed on a page for ["Sentences wanted"](http://tatoeba.org/eng/vocabulary/add_sentences). From this page, contributors can browse vocabylary items with less than 10 sentences, and create sentences for these vocabulary items.

This feature still needs a lot of improvement. For instance:

* There is no way to filter out or remove "spam" vocabulary items.
* There is no system to bump up more demanded vocabulary items.
* The sentences linked to the vocabulary items contain only an exact match of the vocabulary.


### Achievement system

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next.

This system can also make contributing more engaging for the more advanced contributors.


### Improvement of communication tools

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.


### Permissions management

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete others’ sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.


### Audio

Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, but due to the fact that audio was initially not at the core of the project, the process of contributing audio is a bit complicated.

Audio was still a great addition and Tatoeba has received more and more audio contributions over the years. However the audio content lacks many features.

For instance:

* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* Contributors cannot record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.


### Better export

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?


### App using Tatoeba's data

As mentioned in the "Better exports" idea above, Tatoeba shares its data and we are always happy to see projects reusing our data. Do you have a nice idea of an app that you could build from it? This can be a GSoC project as well.

Just one thing: make sure you check this [list of projects that uses our corpus](http://a4esl.org/temporary/tatoeba/links.html). Maybe someone else already had the idea before you. So try to find the gaps. Make something innovative!

Note that this project idea is very tied to the "Better exports" idea, except it tackles the problem from a more concrete angle. Since you will be reusing our data, you will experience real situations where you can see how we can improve the way we share our data. You will be in a better position to find out, or helps us find out, what we could do to make it easier for you (and other people like you) to get started with their projects.


### Quality

As a collaborative project that is open for anyone to join, one of the challenges that Tatoeba faces constantly is to provide data of good quality. Not all Tatoeba contributors are highly skilled in the language(s) they contribute in, and therefore contributions are not always good: they may contain spelling mistakes or grammatical mistakes, they may not sound natural, the translations may be inaccurate on just plain wrong.

Although Tatoeba has some mechanisms to manage quality, these mechanisms are not optimal. Users still need to make extra efforts to figure out when they can really rely on a sentence or translation.

What can we improve in our current system, to provide sentences and translations of higher quality? How can we assess the quality of a sentence or of a translation, so that language learners or third party tools can easily filter out sentences of bad quality, or of uncertain quality?



## Mentors

#### gillux

* Tatoeba: [gillux](https://tatoeba.org/eng/user/profile/gillux) 
* GitHub: [jiru](https://github.com/jiru)

#### halfdan

* Tatoeba: [halfdan](https://tatoeba.org/eng/user/profile/gillux)
* GitHub: [halfdan](https://github.com/halfdan)

#### Trang

* Tatoeba: [Trang](https://tatoeba.org/eng/user/profile/Trang)
* GitHub: [trang](https://github.com/trang)

#### zachleigh

* Tatoeba: [zachleigh](https://tatoeba.org/eng/user/profile/zachleigh)
* GitHub: [zachleigh](https://github.com/zachleigh)


## Contact

#### Google group

Our Google group is called [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject). This is your main entry point to get in touch with us in the scope of Google Summer of Code.

#### Gitter

To interact with us in real time, you are welcome to join our [Gitter chatroom](https://gitter.im/Tatoeba/tatoeba2). We may not be online when you drop by, but feel free to leave a message nonetheless.

#### Tatoeba Wall

The [Wall](http://tatoeba.org/wall/index) is the place where Tatoeba's community discuss things, ask questions, and exchange ideas. We usually read all the messages on the Wall, so you could also get in touch with us from there.

It could happen however that your message goes unnoticed because it got buried behind some passionate discussion, therefore we recommend that you use the Google group at first.

version at: 02/02/2017, 13:02

# GSoC 2017 Project ideas


This page lists project ideas for students who would like to take part in [Google Summer of Code 2017](https://developers.google.com/open-source/gsoc/) and be mentored by [Tatoeba](http://tatoeba.org).

Note that at this stage Google has not yet chosen which organizations will participate to GSoC 2017. The list of accepted mentoring organizations will be published on [**February 27**](https://developers.google.com/open-source/gsoc/timeline). **Until that date, Tatoeba is not officially part of GSoC 2017.**



## About Tatoeba

Tatoeba is a large database of sentences and translations. Its content is ever-growing and results from the voluntary contributions of thousands of members.

Tatoeba provides a tool for you to see examples of how words are used in the context of a sentence. You specify words that interest you, and it returns sentences containing these words with their translations in the desired languages. The name Tatoeba (for example in Japanese) captures this concept.

Source: [https://tatoeba.org/eng/about](https://tatoeba.org/eng/about)



## How to get started

You are a student and are interested in participating to GSoC with Tatoeba mentoring you? Here's how you can get started.

1. Make sure that you have read the [GSoC FAQ](https://developers.google.com/open-source/gsoc/faq) and that **you understand how the program works**. Please check the [calendar](https://developers.google.com/open-source/gsoc/timeline) for the various deadlines.

2. **Spend time using Tatoeba.** You need to have a good understanding of the current functionalities. Note that we have a [dev website](https://dev.tatoeba.org) where you can test anything you want without being afraid of polluting the prod website.

3. We'll expect you to show us that you understand our development process and our tools. The best way to do this is to actually **contribute some code**, by fixing a small bug or implement a small enhancement. For this, follow our [guide for new developers](https://github.com/Tatoeba/tatoeba2/wiki/Joining-the-dev-team).

4. **Read our [requirements regarding GSoC proposals](gsoc_application_requirements).** Start thinking about what you would write in your proposal, and if you need any information, ask us!



## Ideas

Remember that the ideas listed on this page are only ideas. They are here to give you inspiration on what projects you could do with us but **you are in no way limited to these ideas**.


### Sentences wanted

Imagine that you are learning a language, and you are reading some article in this foreign language. You come across new words, and would like to have more example sentences that illustrates the usage of this word. You could go to Tatoeba and search for this word. But what if you don't find any sentence?

To address this, we made it possible for users to create vocabulary lists. When they add a vocabulary item for which no sentence exists, this item is listed on a page for ["Sentences wanted"](http://tatoeba.org/eng/vocabulary/add_sentences). From this page, contributors can browse vocabylary items with less than 10 sentences, and create sentences for these vocabulary items.

This feature still needs a lot of improvement. For instance:

* There is no way to filter out or remove "spam" vocabulary items.
* There is no system to bump up more demanded vocabulary items.
* The sentences linked to the vocabulary items contain only an exact match of the vocabulary.


### Achievement system

The idea behind the achievement system is to give users specific tasks to do and reward them with a badge/medal when they complete the tasks.

This system can be useful to guide new contributors into learning about the features of Tatoeba progressively, or just to know what to do next after they register. Indeed, at the moment, after a user registers on Tatoeba, they are kind of left to themselves to figure out what to do next.

This system can also make contributing more engaging for the more advanced contributors.


### Improvement of communication tools

The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.

The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.

The goal of this project is:

1. to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
2. change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.


### Permissions management

The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete others’ sentences while other contributors cannot.

The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.

Here are example of things that we cannot do at the moment, and that could be part of the project:

* Disallow a user to add new sentences, but still allow them to translate sentences.
* Restrict the languages in which a user can contribute.
* Disallow a user from posting comments only on the Wall, but not on sentences.


### Audio

Tatoeba provides [audio](http://tatoeba.org/eng/sentences/with_audio) for some sentences. These audio are recorded by volunteers, but due to the fact that audio was initially not at the core of the project, the process of contributing audio is a bit complicated.

Audio was still a great addition and Tatoeba has received more and more audio contributions over the years. However the audio content lacks many features.

For instance:

* It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
* Contributors cannot record audio directly through the web page (see this [proof of concept](https://webaudiodemos.appspot.com/AudioRecorder/index.html))

The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.


### Better export

Tatoeba shares its data via CSV files that can be downloaded from the [Downloads](https://tatoeba.org/eng/downloads) page of the website. CSVs are generated on a weekly basis. Third parties can reuse this data in their projects. However, it's not easy to do so because this approach has many limits:

* Third parties must download the whole corpora. There is no way to download a part of it, for instance only sentences in a given set of languages.
* We don’t provide diff between versions. Even if a relatively small part of the corpora changed, third parties must download the whole corpora at each new version.
* The format of the data is documented, yet subject to change at any time. There is no way to notify third parties about this.
* Third parties must wait a week to get new data.
* Third parties must do some preliminary work to restructure the data the way they need it.
* Probably other things.

We would love to see more projects reusing our data, but all this is definitely an entry barrier for many of them. So what can we do to make our export files easier to use?


### App using Tatoeba's data

As mentioned in the "Better exports" idea above, Tatoeba shares its data and we are always happy to see projects reusing our data. Do you have a nice idea of an app that you could build from it? This can be a GSoC project as well.

Just one thing: make sure you check this [list of projects that uses our corpus](http://a4esl.org/temporary/tatoeba/links.html). Maybe someone else already had the idea before you. So try to find the gaps. Make something innovative!

Note that this project idea is very tied to the "Better exports" idea, except it tackles the problem from a more concrete angle. Since you will be reusing our data, you will experience real situations where you can see how we can improve the way we share our data. You will be in a better position to find out, or helps us find out, what we could do to make it easier for you (and other people like you) to get started with their projects.


### Quality

As a collaborative project that is open for anyone to join, one of the challenges that Tatoeba faces constantly is to provide data of good quality. Not all Tatoeba contributors are highly skilled in the language(s) they contribute in, and therefore contributions are not always good: they may contain spelling mistakes or grammatical mistakes, they may not sound natural, the translations may be inaccurate on just plain wrong.

Although Tatoeba has some mechanisms to manage quality, these mechanisms are not optimal. Users still need to make extra efforts to figure out when they can really rely on a sentence or translation.

What can we improve in our current system, to provide sentences and translations of higher quality? How can we assess the quality of a sentence or of a translation, so that language learners or third party tools can easily filter out sentences of bad quality, or of uncertain quality?



## Mentors

#### gillux

* Tatoeba: [gillux](https://tatoeba.org/eng/user/profile/gillux) 
* GitHub: [jiru](https://github.com/jiru)

#### halfdan

* Tatoeba: [halfdan](https://tatoeba.org/eng/user/profile/gillux)
* GitHub: [halfdan](https://github.com/halfdan)

#### Trang

* Tatoeba: [Trang](https://tatoeba.org/eng/user/profile/Trang)
* GitHub: [trang](https://github.com/trang)

#### zachleigh

* Tatoeba: [zachleigh](https://tatoeba.org/eng/user/profile/zachleigh)
* GitHub: [zachleigh](https://github.com/zachleigh)


## Contact

#### Google group

Our Google group is called [tatoebaproject](https://groups.google.com/forum/#!forum/tatoebaproject). This is your main entry point to get in touch with us in the scope of Google Summer of Code.

#### Gitter

To interact with us in real time, you are welcome to join our [Gitter chatroom](https://gitter.im/Tatoeba/tatoeba2). We may not be online when you drop by, but feel free to leave a message nonetheless.

#### Tatoeba Wall

The [Wall](http://tatoeba.org/wall/index) is the place where Tatoeba's community discuss things, ask questions, and exchange ideas. We usually read all the messages on the Wall, so you could also get in touch with us from there.

It could happen however that your message goes unnoticed because it got buried behind some passionate discussion, therefore we recommend that you use the Google group at first.

Note

The lines in green are the lines that have been added in the new version. The lines in red are those that have been removed.