Many science fiction stories provide for a ‘universal translator’, often using it as a convenient plot device to quickly allow individuals from two or more species to communicate from their first words.
Unfortunately, we here on Earth-prime haven’t yet developed such a thing, and that’s why translation across hundreds of language Wikipedias is so important: it lowers the cost of spreading knowledge across the world, as it allows multilingual editors to reuse efforts made by other volunteer editors to cover a topic.
To facilitate this process, we here at the Wikimedia Foundation developed a content translation tool that helps Wikipedia editors to easily translate articles. Content translation simplifies translating Wikipedia articles into different languages by automating many of the boring steps of the traditional translation process.
In early April, content translation reached a new milestone: more than 300,000 articles where created since the tool was released three years ago, making this a good time to reflect on the impact of the tool and discuss future plans.
Content translation video made for the 100,00th translation. Video by Victor Grigas/Wikimedia Foundation, CC BY-SA 3.0. Due to browser limitations, the video will not play on Microsoft Edge, Internet Explorer, or Safari. Please try Mozilla Firefox instead, or watch it directly on Wikimedia Commons.
Thanks to the editors working with content translation, many topics are now available in new languages, making knowledge easy to access to more people in the world. As the tool has been adopted, the article creation rate has increased to the point where more than 400 new articles were being translated per day last month—or one new article every 3.5 minutes.
During these three years, we have received quite a bit of feedback about how the tool has helped Wikipedia editors in different communities. Many editors appreciate that the tool automates most of the manual steps they had to do before, and lets them focus on creating quality content. Expert translators from the Medical Translation Task Force, for example, estimated their productivity increased 17% with content translation, helping them to expand Wikipedia’s coverage of vaccine information faster than before.
In addition to reducing the content gap across languages, another goal of content translation was to generate quality content. Content translation allows to reuse the efforts that editors from other communities made on the source article, finding images, adding references and reviewing the content; which often lead to a better initial version of the article compared to starting from scratch. Our measurements indicate that the deletion rate of articles created with content translation is lower compared to new articles started from scratch in most languages. For example: Spanish had the highest number of translations in 2017. In that same year, less than 10% of the articles started with content translation were deleted—lower than the 52% deletion ratio for new articles that were not created with the tool in the same period.
We want to share more details on how the tool has been adopted by different communities and the plans for the future which include a new version of the tool.
Part of the daily editor toolset on many wikis
Content translation has become part of the daily routine on many Wikipedia communities, accounting for a significant portion of the articles created in those wikis. For example, on the Tamil Wikipedia, 13% of the articles created since content translation was released have been created with this tool.
Tamil is a language spoken by 70 million people, but the Wikipedia in their language has less than 150,000 articles. Tamil speakers can now read in their language about Mexican rag dolls, coronation ceremonies, long hair or more than seven thousand other topics created with content translation.
Tamil Wikipedia articles of different lengths, all created with content translation. Screenshots via the indicated articles on the Tamil Wikipedia, CC BY-SA 3.0. The enclosed images may be under different licenses.
On the Catalan Wikipedia, one of the tool’s early adopters, content translation has been used to start 19% of the articles created since the tool was first available. Other communities like the French have also used content translation to create a significant number of articles (in their case, more than 24,000), but the larger size of that Wikipedia means that those only represent a small percentage of their total article production. However, for many other Wikipedias, large and small, content translation has not been established as a common tool to create new articles yet. You can check the stats page on any Wikipedia for an overview or query our APIs for a deeper analysis on the data about published content.
The adoption of the tool is very different from language to language. Several factors may have an effect on this, including the availability of quality machine translation, the number of languages spoken by their editors, and the quantity of the content available in such languages.
Given the diversity of editors, languages and kinds of content, getting feedback about the use of the tool is essential. Please feel free to provide feedback about the use of content translation for your specific context on the project talk page.
The next version
Currently the Language team is working on a new version of content translation. Version 2 will be a major refactoring and architectural update of the tool. The goal is to make content translation a solid and reliable translation tool that is aligned with the Wikimedia standards in technology and design, and provides a great way to contribute for newcomers.
The new version will include a more powerful editing surface based on Visual Editor, that will allow to solve many of the most often requested features. Reliable support for undo/redo and copy & paste will provide translators with more freedom to manipulate their content. In addition, tools to insert and edit templates, tables, multimedia, categories, and more advanced kinds of content will allow editors to improve their translations further with new content before publishing them.
This is a large effort that will require to rewire many of the tools in order for them to work with the new editing surface. The plan is to gradually replace version 1 with version 2 in several stages. Backwards compatibility will make sure that content created by users during the transition period won’t be affected.
From now on, the focus for developing new features will be on version 2 in order to provide access to the improved version of the tool as soon as possible. The current version of content translation (version 1) will still get maintenance support to make sure that the tool is available for the users that rely on it on a daily basis.
A better experience for newcomers
We’ll take this opportunity to better align the designs in the new version to the Wikimedia design guidelines, and provide better support for new editors. We want to deliver a better experience for newcomers based on learnings from existing and new research on the experience of new editors.
Currently new editors often struggled with some of the error messages they get from the tool. For example, their translation may trigger a “spam” error because a link that already existed in the source article points to a website that is blocked in the target wiki. In such cases, it is not obvious for new users what is going on, where the problem is and how to solve it. Guiding users through the process of reviewing their translation will help them to improve the initial automatic translations further and provide a higher quality initial article as a result.
We believe that with these improvements we can make translation a great way to start contributing to Wikipedia. You can check the project page for the new version to learn more about it, and use the discussion page to provide any feedback.
Pau Giner, Senior User Experience Designer, Audiences Design
- Traditionally, translating Wikipedia articles required much manual effort, including moving back and forth across tabs, copying and pasting from external language tools (e.g. dictionaries), reformatting content, and rewriting links to point to the right place.