María Azqueta, Production Manager of SeproTec and Diego Bartolomé CEO of Tauyou describe the process they went through using machine translation between Spanish dialects at the GALA Conference 2013.
Nowadays, machine translation (MT) is used throughout the industry and almost everyone in the field has likely tried it at least once. Most people are aware of some of its uses and applications. For example, it’s well-known that some Language Service Providers (LSPs) use it to bring down their costs and meet tighter deadlines. Others use it to translate chat messages or e-mails in order to communicate with friends or contact businesses from different parts of the world. It is also helpful for getting a quick idea of what your foreign competitors are offering and for determining if the contents of a file, such as a legal document or patent, warrant the expense of a human translation. There are, nevertheless, still several unexplored applications which use MT and other Natural Language Processing (NLP) tools in a very different way. One of these unexplored applications is the subject of our presentation.
As readers might know, Latin American Spanish is composed of different Spanish dialects spoken throughout the region. Many companies ask LSPs to translate their marketing material into Latin American Spanish or International Spanish, as they want to their products to gain a foothold in this vast market. Argentine Spanish, however, is very different from Mexican Spanish, which is, in turn, quite different from Chilean Spanish, and so forth. Therefore, if you really want to break into a specific market in Latin America, you must decide which country you want to target and localize your material for the different Spanish dialects spoken in each individual country.
Then comes the tricky part: budget. Localizing a project for one language can be hard enough for companies to budget, so budgeting a localization project for four or five dialects of the same language can seem like a Herculean task. SeproTec thus endeavored to find a more cost-efficient solution for effectively reaching the target markets and decided that MT was the answer for making localization accessible and economical for its clients.
SeproTec and tauyou first considered a solution that seemed straightforward enough: directly applying MT to translate from English into the different Spanish dialects. The initial tests, however, showed that the resulting documents required extensive post-editing that was beyond the scope of the available budget. Furthermore, some clients were uncomfortable with the fact that marketing materials were being translated first by a machine.
After consultations with several clients, it was clear that they were willing to invest in a human translation, if not for all of their markets – as not all countries have equal weight in corporate strategy – then at least for their most important targets. On the technical side, tauyou saw the feasibility of a Spanish-to-Spanish translator, who would adapt the Spanish to each country’s dialect (currently Argentina, Chile, Colombia, Mexico, and Puerto Rico, with more to come).
Implementation & Results
Before implementing the machine localization, SeproTec gathered all its translation memories and linguistic assets as well as resources from its clients, such as websites and internal correspondence. This documentation, along with other, publicly available, materials, helped tauyou to build an initial prototype of the engine.
In order to test the quality of this initial prototype, SeproTec localized several strategic documents written in European Spanish for Argentine, Chilean, Colombian, Mexican, and Puerto Rican Spanish. These localizations were performed both by the prototype engine and by a team of human translators and linguists, in order to compare and contrast the results of human and machine localization. Only a few bugs in the engine needed to be fixed to obtain the required quality level and, surprisingly, almost no post-editing of the machine localization was needed.
In essence, the following aspects were considered:
- Lexical variations
- Grammatical differences
It is important to point out that the human localizations also had to go through a process of revision, as some of the terms were not correctly adapted in their first iteration.
Conclusion and further work
SeproTec’s clients are now able to localize their marketing material for the different Latin American markets they target with substantially reduced overhead. As with any technological tool, however, the localization engine still has room for improvement. Specifically, we are currently working on:
- Improving the lexicons and grammars. It’s important to correct as many words as possible, so we are continuously adding new words to the system. Some grammar rules are still not covered by the system. In this regard, tauyou is developing a simple web interface to handle terminology and grammar that will be made available in the near future.
- Extending the language coverage:
- Including more Spanish dialects (Peruvian, Uruguayan, Guatemalan, etc.)
- Incorporating other languages, such as automatic localization of UK English to American English and vice versa and of European Portuguese to Brazilian Portuguese and vice versa. Moreover, as SeproTec & tauyou have already developed a very powerful MT engine for the Spanish to Portuguese combination, a new version, specifically for Brazilian Portuguese, is going to be developed in order to cover a greater area of the Latin American Market.