Actualités, CAT, Langues, Traductions

What is Neural Machine Translation (NMT)

Last year professionals had talked much about NMT. What is NMT?

Neural machine translation (NMT) is a machine translation approach that uses a large artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

All of the machine translation products (websites or apps) were based on algorithms using statistical methods to try to guess the best possible translation for a given word. This technology is called statistical machine translation.

However, one of the limitations of statistical machine translation is that it only translates words within the context of a few words before and after the translated word. For small sentences, it works pretty well. For longer ones, the translation quality could vary.

Now we have a new machine learning technology called deep learning or deep neural networks, one that tries to mimic how the human brain works (at least partially).

At a high-level, neural network translation works with in two stages:

— A first stage models the word that needs to be translated based on the context of this word (and its possible translations) within the full sentence, whether the sentence is 5 words or 20 words long.

— A second stage then translates this word model (not the word itself but the model the neural network has built of it), within the context of the sentence, into the other language.

One way to think about neural network-based translation could be to think of a fluent speaker in another language that would see a word, say “dog”. This would create the image of a dog in his or her brain, then this image would be associated to, for instance “le chien” in French. The neural network would intrinsically know that the word “chien” is masculine in French (“le” not “la”). But, if the sentence were to be “the dog just gave birth to six puppies” , it would picture the same dog with puppies nursing and would then automatically use “la chienne” (female form of “le chien”) when translating the sentence.

Because of this approach, sentences that are generated from a neural network based machine translation are usually better than statistical machine ones but also sound more fluent and natural, as if a human had translated them and not a machine.

Source: Microsoft


The World Wide Web and Translators


In today’s translation market with online jobs and tight deadlines, the Internet has turned into one of the new resources translators commonly use to obtain fast and easy access to translationally-relevant information. The online translation resources translators draw on may vary from online glossaries to online corpora. This, however, is not the only application of the Web for translators. In fact, the Web itself is a valuable source of information for translators far beyond what dictionaries have to offer. The core of this new application of the Web as a source of linguistic information is based on the concept of the Web as a corpus. The present paper discussing the concept of the Web as a corpus elaborates on the applications of the World Wide Web for translators.


There is no doubt that the Internet is one of the greatest inventions of the 20th century which has had a tremendous impact on our everyday lives. It not only has eased the access of information, but has also provided a new mode of communication across the globe. This valuable source of information has also a lot to offer translators. The present paper elaborates on the applications of the Internet for translators. It further focuses on the concept of the Web as a big corpus and its implications for translators.


Some years ago, dictionaries and possibly typewriters were translators’ best friends. Today, however, computers with the Internet connection seem indispensable to what translators do. As the results of a comprehensive survey in 2005 within an EU-funded project show, around 95% of the participants who were translation students and professional translators made use of the Web for translation-related tasks. In fact, it can be claimed that the Web has turned into a new resource for today’s translators. Now let us see what the Web has to offer translators.


There are quite a large number of free dictionaries, glossaries and term banks in various languages and specialized fields available online which can be easily accessed by users around the world. The Internet thus can be considered as a virtual library with a huge number of dictionaries, glossaries and term banks in various languages and specialized fields available to translators around the world. While in the past translators may have had to leave the comfort of their homes or offices to get access to such resources or have had to invest greatly in them, today they have literally all kinds of resources at their finger tips.


The present translation market is above all marked by its online nature. A considerable number of translation jobs today are posted to online translation portals where translators from all over the world can quote and get the jobs. Besides being a source for translation-related tools like the dictionaries, glossaries, etc. the Internet thus can be said to turn into a primary channel for clients and translators to communicate and work together.

With the Internet at their disposal, translators can get jobs from clients all over the world. In other words, the Internet has changed considerably the nature of the translation market.


There is also a number of free machine translation systems available on the Internet. Though such online MT systems are generally intended to be used by non-professional occasional users, they may come in handy in professional translation as well. Babelfish, Systran, SDL Free Translation and Google Translate are some of the most well-known machine translation systems available online.


The Web itself can be regarded a valuable source of translationally-relevant information for translators. Considering the Web as a big corpus of texts in various languages, translators can extract valuable translationally-relevant information from the Web using either search engines or web concordancers. Before going on any further into this discussion, it is necessary to first elaborate on the concept of the Web as a corpus.


To see whether the Web has what it takes to be considered a corpus, the present section is devoted to a comparison between the idiosyncrasies of the Web and the features of corpora.

Based on the various definitions put forward for corpora in the literature, corpora have three main features, i.e. they contain authentic texts; they contain texts sampled to be maximally representative of the language variety under study and finally they are in machine readable form (Tengku Mahadi, Vaezian & Akbari: 2010). Tognini-Bonelli (2001: 55) defines authenticity of texts in corpora in the following words: “All the material included in the corpus, whether spoken, written or gathered along any intermediate dimension, is assumed to be taken from genuine communication of people going about their normal business”. There is no doubt that the Web shares this feature of corpora with online texts being real instances of language in use. As for the second feature, i.e. representativeness, the Web with its vast amount of texts in various languages can be said to contain representative samples of texts in various languages. Finally, the Web certainly shares the third feature of corpora with online texts being in machine readable form.

The Web, thus, can be said to more or less share the basic features of corpora. The web in fact can be considered as a large multilingual monitor corpus which is constantly updated with new texts. Now let us see why translators would want to draw on the Web as a big corpus instead of real corpora.



If we consider the Web as a big corpus, we can simply consider the various search engines such as Google, Yahoo, and Bing as corpus analysis tools used to query this giant corpus. Search engines are in fact programs that examine sites and store information about the contents of the sites. So when a search is performed, the search engines start searching the documents stored on the Web for the specified keywords and returns a list of documents with the respective keywords. What search engines do is, in fact, very close to what concordance features in corpus analysis tools do in that they both search the documents for the specified keywords and find the instances in which the keywords are used.

Some search engines such as Google has a number of features which allow users to refine their searches in order to make the most of their searches. Some of the most useful search features of the Google search engine, from a translation perspective, are phrase search, search to exclude terms and wildcard search. Phrase search allows users to look for an exact phrase by putting double quotation («  ») marks around a set of words. In the search to exclude terms, by using a minus sign (-) immediately before a word, Google would not include the pages containing the respective word in the results, and finally by using an asterisk (*) within a query in wildcard search, Google treats the star as a placeholder for any unknown term(s) and then finds the best matches (Google guide: online).


Nowadays professional translators, more often than not, work with variety of texts and texts types.  As Ulrych (2005: 22) has stated “the idea that professional translators work predominantly in one or two specialist fields is in fact swiftly losing grounds…”.  This situation asks for translators to have an encyclopedic knowledge of various specialized fields or else have the necessary resources to gain such knowledge when the need arises.

As Pym states (1993: 114) the specialization in translation market implies that “a good translator is not someone who knows many things but someone who has the skills and contacts to find specific information when necessary”. The key to success in the present translation market, thus, can be said to lie in being resourceful and the Web seems to have a lot to offer to translators from this perspective. The World Wide Web in fact can be considered as an invaluable resource for translators in that it not only contains a vast amount of linguistic information about various languages and text types, but also provides translators with a channel to communicate with fellow translators, subject matter experts and above all the clients.

By Tengku Sepora Tenkgku Mahad