Free online translation tools, like Google Translate, have become ubiquitous over the last several years. The history of automated translation goes back to the first years following World War II when it was investigated as a possible option for translating copious amounts of Russian language content during the Cold War. Now, in the 21st century, the idea of completely automated translation has captured people’s imaginations and given a glimpse of a future where people can communicate seamlessly without having to spend years learning a foreign language.
While it is true that the technology has advanced rapidly over the past decade and has already established itself as a service offered by many language service providers (LSPs), it is still far from being able to replace the quality afforded by human translators. That being the case, what is the role of machine translation now and when is it appropriate to use?
First of all, it is important to understand exactly what is machine translation. According to the Globalization and Localization Association (GALA), “Machine translation (MT) refers to fully automated software that can translate source content into target languages.” There are many different types of MT which have been developed for different purposes and take different approaches to the process of automated translation.
Systems like Google Translate are known as “Generic MT.” Other systems include “Customizable MT,” wherein MT programs can be customized for specific language pairs and subject matters, and “Adaptive MT” which provides suggestions to translators while using a Computer-Assisted Translation (CAT) tool. The Adaptive MT then “learns” in real-time from the translator’s input. However, it is important to note here that “CAT” tools are not machine translation, but rather a tool used by human translators to enhance efficiency, consistency, and quality.
These MT systems typically follow one (or more) approaches and are selected based on the type of content to be translated, level of quality required, and cost considerations. The main four MT systems include:
- Rules-Based MT (RbMT), which uses customized dictionaries and grammars for the rules of both the source language (SL) and target language (TL), a system for linking the two together based on algorithms, and then providing a translated output;
- Statistical Machine Translation (SMT), which employs pattern-match reference texts to locate the most statistically likely translation of a given portion of text;
- Neural Machine Translation (NMT) is one of the most exciting developments in the field of MT, wherein machine learning technology (and artificial intelligence) allows the MT engine to “learn” over time. NMT is the technology now being rolled out by Google, replacing its previous SMT model.
- Hybrid Machine Translation (HMT) is a synthesis of both RbMT and SMT systems.
There are many obvious advantages to adopting MT for a client’s translation needs. For example, for large volumes of content to be used for internal purposes only, MT can be a more cost-effective and faster solution than human translation. Additionally, for highly specific and formulaic content (e.g., Japanese chemical patents), MT can provide a reasonably high-quality output with limited human editing on the back-end.
For large corporations that work with enormous amounts of content that require translation and regular updating in multiple languages and complex content management requirements, MT may also be a viable option. MT engines can be developed that are tailored to a company’s specific needs (languages, terminology, etc.), and while the initial cost outlay may be significant, the savings over time can be considerable.
Of course, such companies would still need to maintain either in-house or outsourced human translators and editors to review the MT output, particularly for documents that are meant for external distribution and/or require a high degree of quality control, such as legal documents. MT systems can also be integrated with many content management systems (CMS) and CAT tools, providing a robust and integrated translation and content management environment.
For smaller companies, however, the initial cost of developing an MT engine is usually not viable. Of course, they can work with already existing systems specific to their industry, but the quality of those translations can vary greatly and will typically require significant effort on the back-end of the translation workflow from human editors. After all, MT output can only account for the “T” (Translation) portion of the standard TEP (Translation-Editing-Proofreading) workflow.
Another problem with MT is that while we live in an increasingly digital world, and much content nowadays is being developed and retained in a “soft” digital format, many clients still maintain either hard-copy records or their digital copy may be in the form of scanned PDFs. In these cases, MT engines cannot “read” the text to be translated without first processing the content through Optical Character Recognition (OCR) software.
While OCR technology has improved over the past decade, it still usually requires many person-hours to manually re-format and verify that the conversion was accurate and fix any errors. This is especially true in the case of scanned PDFs (versus digitally-generated PDFs) that contain handwriting or sub-par quality. In such cases, the cost of manual re-formatting and verification of the OCR output often outweighs any cost savings from MT.
For clients, though, MT has opened up new options when it comes to translating their content. While it may not be appropriate for every client or type of project, it is becoming a more viable solution and clients should consult with their language service provider to weigh the pros and cons in terms of the cost, speed, and level of quality based on their requirements.