Actualités, CAT, Langues, Traductions

What is Neural Machine Translation (NMT)

Last year professionals had talked much about NMT. What is NMT?

Neural machine translation (NMT) is a machine translation approach that uses a large artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.

All of the machine translation products (websites or apps) were based on algorithms using statistical methods to try to guess the best possible translation for a given word. This technology is called statistical machine translation.

However, one of the limitations of statistical machine translation is that it only translates words within the context of a few words before and after the translated word. For small sentences, it works pretty well. For longer ones, the translation quality could vary.

Now we have a new machine learning technology called deep learning or deep neural networks, one that tries to mimic how the human brain works (at least partially).

At a high-level, neural network translation works with in two stages:

— A first stage models the word that needs to be translated based on the context of this word (and its possible translations) within the full sentence, whether the sentence is 5 words or 20 words long.

— A second stage then translates this word model (not the word itself but the model the neural network has built of it), within the context of the sentence, into the other language.

One way to think about neural network-based translation could be to think of a fluent speaker in another language that would see a word, say “dog”. This would create the image of a dog in his or her brain, then this image would be associated to, for instance “le chien” in French. The neural network would intrinsically know that the word “chien” is masculine in French (“le” not “la”). But, if the sentence were to be “the dog just gave birth to six puppies” , it would picture the same dog with puppies nursing and would then automatically use “la chienne” (female form of “le chien”) when translating the sentence.

Because of this approach, sentences that are generated from a neural network based machine translation are usually better than statistical machine ones but also sound more fluent and natural, as if a human had translated them and not a machine.

Source: Microsoft

CAT, Langues, Traductions

Machine and manual translation – 1

Machine translation is also known as Computer Aided Translation, is basically the use of software programs which have been specifically designed to translate both verbal and written texts from one language to another.

Trados is one of a few computer-assisted translation tools (CAT tools). Its primary function is to allow translators to reuse translations. SDL purchased Trados a few years ago, and their products are generally branded now under the name of « SDL/Trados ». The advantages of using machine translation include the fact that you can make documents in several languages easily.  Generally, it is rather useful for specialized texts (medical, technical, legal), I think. In my opinion any CAT tool is good for those parts of texts that repeat: if you have to translate extracts from business registers, school-leaving certificates, birth certificates, legal records or other such documents, then a CAT tool will do good. One extremely good thing is that Trados keeps the original formatting, so you usually don´t have to deal with the visual form of a document.  When more people work on large projects, it helps to use the same terminology and thus increase the overall quality of translated documents. One of advantages of CAT-Tools is terminology handling.  A good Multiterm-glossary can be extremely useful for legal documents, too. If you are dealing with repetitive texts that are crawling with specific terminology, then this software is the tool for you too.  Next advantage is that with CAT-tools you cannot accidentally leave out a sentence – something that can all too easily happen when overwriting. Another advantage of Trados  is an excellent way to review other people’s texts.

The main disadvantage of using machine translation is his cost, but you can leverage it in a very short time. I heard several people have said that Trados (or any CAT tool) is no good for creative texts, literary translation etc. The other disadvantage is the accuracy of translated material (text) depending on word ordering of original text.

Nuances, cultural differences, and vocabulary that is very local need to be translated by a person. Systematic and formal rules are followed by machine translation so it cannot concentrate on a context and solve ambiguity and neither makes use of experience or mental outlook like a human translator can.


Translation memories

A translation memory is a linguistic database that continually captures your translations as you work for future use. All previous translations are accumulated within the translation memory (in source and target language pairs called translation units) and reused so that you never have to translate the same sentence twice. The more you build up your translation memory, the faster you can translate subsequent translations, enabling you to take on more projects and increase your revenue.

Translation memories are typically used in conjunction with a dedicated computer assisted translation (CAT) tool, word processing program, terminology management systems, multilingual dictionary, or machine translation output.

Research indicates that many companies producing multilingual documentation are using translation memory systems. In a survey of language professionals in 2006, 82.5% out of 874 replies confirmed the use of a TM. Usage of TM correlated with text type characterised by technical terms and simple sentence structure (technical, to a lesser degree marketing and financial), computing skills, and repetitiveness of content.

1970s is the infancy stage for TM in which scholars carried on a preliminary round of exploratory discussions. The original idea for TM is often attributed to Martin Kay’s « Proper Place » paper, but the details of it are not fully given. In this paper, it has shown the basic concept of the storing system: »The translator might start by issuing a command causing the system to display anything in the store that might be relevant to …. Before going on, he can examine past and future fragments of text that contain similar material ». This oberservation from Kay was actually influenced by the suggestion of Peter Arthern that translators can use similar, already translated documents online. In his 1978 article he gave fully demonstration of what we call TMS today: Any new text would be typed into a word processing station, and as it was being typed, the system would check this text against the earlier texts stored in its memory, together with its translation into all the other official languages [of the European Community]. … One advantage over machine translation proper would be that all the passages so retrieved would be grammatically correct. In effect, we should be operating an electronic ‘cut and stick’ process which would, according to my calculations, save at least 15 per cent of the time which translators now employ in effectively producing translations.

Another people named Alan Melby and his group at Brigham Young University were also claimed to be the founding father of TMS. The idea was incorporated from ALPS(Automated Language Processing Systems) Tools first developed by researcher from Brigham Young University, and at that time the idea of TMS was mixed with a tool call « Repetitions Processing » which only aimed to find maches strings. Only after a long time, did the concept of so-called Translation Memory come into being.

The real exploratory stage of TMS would be 1980s. One of the first implementation of TMS appeared in Sadler and Vendelmans’ Bilingual Knowledge Bank. A Bilingual Knowledge Bank is a syntactically and referentially structured pair of corpora, one being a translation of the other, in which translation units are cross-coded between the corpora. The aim of Bilingual Knowledge Bank is to develop a corpus-based general-purpose knowledge source for applications in machine translation and computer- aided translation(Sadler&Vendelman, 1987). Another important step was made by Brian Harris with his « Bi-text ». He has defined the bi-text as « a single text in two dimensions » (1988), the source and target texts related by the activity of the translator through translation units which made a similar echoes with Sadler’s Bilingual Knowledge Bank. And in Harris’s work he proposed something like TMS without using this name: a database of paired translations, searchable either by individual word, or by » whole translation unit », in the latter case the search being allowed to retrieve similar rather than identical units.

TM technology only became commercially available on a wide scale in the late 1990s, so the efforts made by several engineers and translators including Alan Melby, Sumita and Tsutsumi, etc. But more worth mentioning is the first TM tool called Trados(SDL Trados nowadays). In this tool, when opening the source file and applying the translation memory so that any « 100% matches » (identical matches) or « fuzzy matches » (similar, but not identical matches) within the text are instantly extracted and placed within the target file. Then, the « matches » suggested by the translation memory can be either accepted or overridden with new alternatives. If a translation unit is manually updated, then it is stored within the translation memory for future use as well as for repetition in the current text. In a similar way, all segments in the target file without a « match » would be translated manually and then automatically added to the translation memory. Another significant milestone of TMS is the projects at IBM’s European Language Services (Denmark) in which massive translation memory were used to remove language barrier.