- Posted by Stephen Whiteley
- On 16/05/2019
Google Translate is a free tool that has become the internet’s go-to resource for a quick translation of short texts. (Most people would agree that for a top-quality translation service you should go to a serious language service supplier or LSP).
All you have to do is select the language pair, copy and paste the text you want translated in the right box and watch as your computer instantly produces a translated version.
The process might seem straightforward and simple at first sight; however, behind this seemingly simple input-output process is a complex system of machine learning.
In the beginning (or at least when Google Translate was first launched), the system relied on statistics to translate any chunk of text. As such, it needed millions and millions of human-translated documents, books, and texts to look for patterns. Gaining access to this vast body of works in different languages was of course difficult.
However, Google Translate makes special use of UN translated documents, which allows it to compare between several human translated files for the six official UN languages.
This way, the machine was able to do better translations since it is now able to weigh a plethora of options of translated text to select the version that occurs more often.
For instance, based on statistics, the system is now able to detect that the French phrase “Voyage de noce” is not translated into English as some word-for-word translation suggesting wedding travel. Rather, it is usually translated as “honeymoon”.
So, by analyzing human-translated documents, the machine can detect frequently used versions of any one expression.
The implication of this is that the machine needs access to linguistic databases large enough to allow accurate statistical analysis.
Also, in its early days Google Translate relied on English as an intermediary language, ie: when translating between any other two languages.
For instance, when translating from Igbo (a West African Language) to Arabic, Google Translate would first translate the text from Igbo to English. Having done that, it would translate from English to Arabic.
Given that there are more documents available in English than in any other language, it makes sense then for the machine to go through those iterations.
So, as it turns out, Google Translate was pretty adept at translating short texts. However, things start getting a bit weird when you have a large chunk of text to translate and or translating a text stuffed with complex grammar or linguistic nuances only perceptible by humans.
In 2016 however, Google announced that the Google Translate service is now based on a new learning premise – the neural network machine learning process.
A deep learning system that enables the machine to compare texts from broad sources at the same time, hence ensuring that the translation process takes into account the full context of words instead of translating documents in isolation.
Here’s how it works – Google Translate compares let’s say Chinese-to-English translations with Japanese-to-English translations. It then makes a connection between the Chinese and Japanese languages. Thereby, translating directly from Chinese to Japanese without the need for English intermediary.
And as the translation engine handles these translations over and over again, it begins to spot patterns between words in different languages and thus becomes better at translating those language pairs.