The development and use of machine translation systems and computer-based translation tools

Historical background

Systems for automatic translation have been under development for 50 years – in fact, ever since the electronic computer was invented in the 1940s there has been research on their application for translating languages (Hutchins 1986). For many years, the systems were based primarily on direct translations via bilingual dictionaries, with relatively little detailed analysis of syntactic structures. By the 1980s, however, advances in computational linguistics allowed much more sophisticated approaches, and a number of systems adopted an indirect approach to the task of translation. In these systems, texts of the source language are analysed into abstract representations of ‘meaning’, involving successive programs for identifying word structure (morphology) and sentence structure (syntax) and for resolving problems of ambiguity (semantics). Included in the latter are component programs to distinguish between homonyms (e.g. English words such as light, which can be a noun, and adjective or verb, and solution, which can be a mathematical or a chemical term) and to recognise the correct semantic relationships (e.g. in The driver of the bus with a yellow coat). The abstract representations are intended to be unambiguous and to provide the basis for the generation of texts into one or more target languages. There have in fact been two basic ‘indirect’ approaches. In one the abstract representation is designed to be a kind of language-independent ‘interlingua’, which can potentially serve as an intermediary between a large number of natural languages. Translation is therefore in two basic stages: from the source language into the interlingua, and from the interlingua into the target language. In the other indirect approach (in fact, more common approach) the representation is converted first into an equivalent representation for the target language. Thus there are three basic stages: analysis of the input text into an abstract source representation, transfer to an abstract target representation, and generation into the output language.

Until the late 1980s, systems of all these kinds were developed, and it is true to say that all current commercially available systems are also classifiable into these three basic system types: direct, interlingual and ‘transfer’. The best known of the MT systems for mainframe computers are in fact essentially of the ‘direct translation’ type, e.g. the Systran, Logos and Fujitsu (Atlas) systems. They are however improved versions of the type; unlike their predecessors, they are highly modular in construction and easily modifiable and extendable. In particular, the Systran system, originally designed for translation only from Russian into English, is now available for a very large number of language pairs: English from and into most European languages (French, German, Italian, Spanish, Portuguese), Japanese, Korean, etc. Logos, originally marketed for German to English, is also now available for other languages: English into French, German, Italian and Spanish, and German into French and Italian. The Fujitsu ATLAS system, on the other hand, is still confined to translation between English and Japanese (in both directions).

Among the most important of the mainframe ‘transfer’ systems was METAL, supported for most of the 1980s by Siemens in Germany. However, it was only at the end of the decade that METAL came onto the market, and sales were poor. During the 1990s, rights to METAL have been transferred to two organisations (GMS and LANT) in a complex arrangement. But the best known systems adopting the ‘transfer’ approach were research projects: Ariane at GETA in Grenoble, an MT project going back to the 1960s, and Eurotra funded by the Commission of the European Communities. There were hopes that Ariane would become the French national system, and there were plans to incorporate it in a translator’s workstation for Eurolang (see below), but in the end nothing came of them. As for Eurotra, it was undoubtedly one of the most sophisticated systems, but after involving some hundred of researchers in most countries of Western Europe for almost a decade, it failed to produce the working system that the sponsors wanted. It had been hoped that Eurotra would eventually replace the Systran systems that the Commission had acquired and was developing internally. In the late 1980s, Japanese governmental agencies began to sponsor an interlingua system for Asian languages, involving co-operation with researchers in China, Thailand, Malaysia and Indonesia. However, this project too has so far not produced a system after a decade of work. (For surveys of MT research and development in 1980s and early 1990s see Hutchins 1993, 1994.)