Notice
This page show a previous version of the articleGSoC 2015 Project ideas
This page lists project ideas for students who would like to take part in Google Summer of Code 2015 and be mentored by Tatoeba.
About Tatoeba
Tatoeba is a platform that aims to build a large database of sentences translated into as many languages as possible. The initial idea was to have a tool in which you could search certain words, and it would return example sentences containing these words with their translations in the desired languages. The name Tatoeba resulted from this concept, because tatoeba means for example in Japanese.
You can browse the blog or the wiki for more information about the project.
Contact
- Google group: tatoebaproject
- IRC: #tatoeba on freenode, Webchat
- XMPP: Tatoeba conference room on chat.tatoeba.org
To get a feeling for the discussions taking place within the Tatoeba contributor community, visit the Tatoeba Wall page.
How to submit ideas
If you would like to submit an idea and do not have access to the wiki, please contact us and send us the information below. If you have access to the wiki, simply edit this page and add the information in the Ideas section.
### Project title #### Description Brief description of the project. If you have already specified a lot of things about the project, do not write all the details here. Create a separate wiki page for it and only write a summary here, with a link to that wiki page. #### Deliverables What is the student expected to deliver at the end of the summer. #### Prerequisite knowledge Technical knowledge required to be able to complete the project. If you do not know what are the prerequisite knowledge for the project you are proposing, you can leave this blank, someone else will complete it. #### Possible mentors People from the team that may be able to mentor that idea. You can leave this section blank if you’re a student. Please only add a mentor’s name if you are that person or if the person explicitly agrees.
A note for students
If you are a student and are interested to work on one of the projects listed below, note that at this stage Google has not yet chosen which organizations will participate to GSoC 2015. The list of accepted mentoring organizations will be published on March 2. Until that date, Tatoeba is not officially part of GSoC 2015.
Of course this should not stop you from getting started on a project ahead of time. If you do so, we recommend you the following.
- Make sure that you have read the GSoC FAQ and that you understand how the program works. Please check the calendar for the various deadlines.
- If the project you are interested in involves implementing code in the current version of Tatoeba, install Tatoeba on your machine, explore the code, experiment with it.
- Start preparing your proposal. You won't be implementing anything (at least not anything related to a GSoC project) until you are officially a GSoC student for Tatoeba.
- If you would like to contribute code to get familiar with the project before GSoC, but don't know how to get started, you can read this guide.
Ideas
Mobile friendly user interface
Description
Around 30% of the visitors of Tatoeba are browsing the website from a mobile device, but the usuability of the current website on mobile devices is very poor. The idea of this project is to redesign the UI to improve the user experience for visitors who are using a mobile.
Discussion in Google group: GSoC 2015 - Mobile friendly user interface
Deliverables
Implementation in Tatoeba's source code.
Prerequisite knowledge
PHP, HTML, CSS
Possible mentors
Trang
Extension of the search feature
Description
The search feature is currently available only for sentences and the search criteria are limited to the source/target language and the sentence's text. The goal of this project would be:
- To implement more search criteria (tags, username, audio, date...) (See issue #53)
- To extend the search feature to comments, wall messages, and possibly other contents (private messages, profile...).
Here are some examples of search we would like to be able to do:
- Get all sentences in a given language by a given user that have not been translated into a given language. For example: Show me all English sentences by user "CK" not yet translated into Japanese.
- Same as above, but limited to sentences with audio. For example: Show me all English sentences by "CK" with audio that have not been translated into Japanese.
- Get all sentences in a given language with a certain tag not translated into a given language. For example: Show me all Georgian sentences with the tag "restaurant" not translated into Armenian.
- Same as above, but limited to sentences by native speakers not translated into a given language. For example: Show me all Korean sentences by native speakers with the tag "weather" not translated into Japanese.
- Get all sentences in a given language under a certain length not yet translated into a given language. For example: Show me all Japanese sentences fewer than 50 characters in length not translated into French.
- Same as above, but limited to sentences by a given user.
- Get all sentences of a given language that match a given search keyword that have not been translated into a given language. For example: Show all English sentences with the word "mountain" not translated into Japanese.
- Same as above, but limited to sentences by a given user.
Deliverables
Implementation in Tatoeba's source code.
Prerequisite knowledge
CakePHP, Sphinx
Possible mentors
gillux
Wish list for words and expression
Description
The wish list for words and expressions allows users to add words and expressions to a list and other users can fulfill the wishes by adding sentences with these words and expressions.
The implementation of this feature consists of three new views/pages: "Add to wish list", "Browse wish list", and "Wish: xxx in language by user username".
- "Add to wish list" is a page where users can submit new wishes.
- "Browse wish list" is a page where users can browse the wishes the other users have submited.
- "Wish: xxx in language by user username" is a page for each individual wish where the orginal submiter of the wish can modify the wish, other users can fulfil the wish, and all the users can discuss about the wish.
At the upper part of all of these pages there are two tabs/links: "Add to wish list" and "Browse wish list" for easy access from page to page.
See more detailed description: Wish list for words and expression
Deliverables
Implementation in Tatoeba's source code.
Prerequisite knowledge
CakePHP
Possible mentors
gillux
Achievement system
Description
The goal of this project is to implement a system of achievements that would give users specific tasks to do and reward them with a badge/medal when they complete the tasks.
Such a system would be particularly helpful for new contributors. Tatoeba is indeed still not very intuitive. At the moment, when a user registers, they are redirected to a "Getting started" page where information is too dense and that most of them probably don't read. The badge system would guide these new contributors into learning about the features of Tatoeba progressively.
This can of course also make contributing more engaging for the more advanced contributors.
Deliverables
Implementation in Tatoeba's source code.
Prerequisite knowledge
CakePHP, MySQL, knowledge about gamification
Possible mentors
Trang
Improvement of communication tools
Description
The Wall is the main place for members to communicate with each other publicly. There are however no categories like in a regular forum. All the topics are mixed together. As a result, one cannot easily find all the posts where people introduce each other, or all the posts where people submit suggestions, or all the posts that are announcements from the admins.
The private messages are very old style. There is no notion of a discussion thread, and therefore each message is displayed alone, even if it was a reply of a previous message. This makes it rather unpractical to have a conversation with private messages.
The goal of this project is:
- to improve the Wall, or possibly replace it with a forum, or implement a forum in addition to the Wall.
- change the private messages system to display all the messages from a same discussion in a same thread, rather than separated into several private messages.
Deliverables
Implementation in Tatoeba's source code.
Prerequisite knowledge
CakePHP, MySQL
Possible mentors
gillux
Permissions management
Description
The permissions of a user are based mostly on the user's status: depending on whether you are a contributor, advanced contributor, corpus maintainer or admin, you will have access to more or less features. For instance advanced contributors an add tags to a sentence, while regular contributors cannot. Corpus maintainers can delete sentences while other contributors cannot.
The goal of this project is to design and implement a more refined permission system, with an interface to manage these permissions.
Here are example of things that we cannot do at the moment, and that could be part of the project:
- Disallow a user to add new sentences, but still allow them to translate sentences.
- Restrict the languages in which a user can contribute.
- Disallow a user from posting comments only on the Wall, but not on sentences.
Deliverables
Implementation in Tatoeba's source code.
Prerequisite knowledge
CakePHP, MySQL
Possible mentors
?
Audio
Description
Tatoeba provides audio for some sentences. These audio are recorded by volunteers, and the process of contributing audio is a bit complicated. This is due to the fact that audio was not at the core of the project.
Audio is still a great addition to the project and Tatoeba has received more and more audio contributions over the years. But the audio content lacks the structure that the sentences in the textual corpus benefit of.
- There is no way to know (from the website) who is the author of an audio file, not when it was contributed (cf. Github issue #547).
- It is not possible either to attach several audio to a same sentence (to illustrate different accents of a same language for instance).
- It is a bit tedious to update and maintain the audio. Contributors have to follow a certain procedure, then their audio has to be uploaded to the server, then one of the server admins have to run some script to update the database. Surely we can make this simpler.
- It would also be nice if users could record audio directly through the web page.
The goal of this project would be to implement the necessary features for a better management the audio content in Tatoeba.
Deliverables
Implementation in Tatoeba's source code
Prerequisite knowledge
CakePHP if implementation in Tatoeba.
Possible mentors
?