next up previous contents index
Next: 8.4 Machine-aided Human Translation Up: 8 Multilinguality Previous: 8.2 Machine Translation:

8.3 (Human-Aided) Machine Translation: A Better Future?

Christian Boitet
Université Joseph Fourier, Grenoble, France

As the term translation covers many activities, it is useful to distinguish, at least, between:

8.3.1 Types of MAT Systems Available in 1994

It is impossible to envisage an automation of re-creation translation and of localization which would go beyond machine aids for human translators for many years to come. By contrast, the translating function may be automated in the case of diffusion-translation and screening-translation. To fix our vocabulary, we would like to take the term machine assisted translation (MAT) as covering all techniques for automating the translation activity. The term human-aided machine translation (HAMT) should be reserved for the techniques which rely on a real automation of the translating function, with some human intervention in preedition, postedition or interaction. The term machine-aided human translation (MAHT) concerns machine aids for translators or revisors and is the topic of the section gif.

MT for Screening Purposes

Around 1949, MT projects were launched first in the US, and soon thereafter in the USSR. They were motivated by the growing needs for intelligence gathering. They gave rise to the first MT screening systems. The goal of such systems is to produce automatically, quickly and cheaply large volumes of rough translations. The quality of the rough translations obtained is not essential. The output can be used to get an idea of the content. If the user wants a good translation of a part which looks interesting, he simply asks a human translator (who in general will judge the machine output to be too bad to bother with revision).

What is essential is that in order to keep costs low, no professional translator or revisor should be used. Preedition should be reduced to confirming system proposals for separating figures, formulae, or sentences. Postedition, if any, should consist only in formatting operations. The need for screening MT is still actual. However, civil uses (gathering technological, economical and financial information) are now predominant over military uses. Examples of working systems are SYSTRAN (Russian-English in the US and several language pairs at the EC), ATLAS-II (Japanese-English for the EC), and CAT from Bravice, used to access Japanese data bases in English [SG87].

Users can get access to these systems from terminals (even minitels), standard PCs or Macintoshes connected to a network. In the last few years, stand alone configurations have appeared on PCs and workstations. We describe briefly the different access modes:

Access to a Server:

In France, Systran SA commercializes an MT server via the minitel network (6--7 million of these relatively dumb terminals are installed in French homes). This service gives access to several Systran language pairs. This system can meet users expectations if used for screening purposes (translation into the mother tongue). At the European Commission, Systran has also been used since the end of 1976. These translations are now distributed as they stand to interested readers, instead of being revised by human translators. With that change, the amount of texts going through MT has suddenly increased from 2,000 pages in 1988 to 40,000 in 1989 to 100,000 in 1993 (the total number of pages translated varying from 800,000 to 1,000,000 to 1,500,000). We should also mention the growing use of PC's connected to computer networks for getting access to rough MT translations of textual data bases (economical for NHK, scientific and technical at JICST, etc.), sometimes transcontinentally [SG87].

Integrated Stations:

Hardware has become powerful and cheap enough to run some MT systems on a PC, possibly coupled with an OCR. These systems include very restricted systems for diffusion, such as METEO on PC, and some systems for screening, such as Translator by Catena on Macintosh. However, at this point, the size of the dictionaries and the sophistication (and associated computational cost) of the underlying tools make workstations mandatory for the majority of currently available commercial systems but this is bound to change soon.

MT for Diffusion Purposes

Work on diffusion MT or MT for the revisor began when the first interactive systems appeared. The aim is to automate the production of professional quality translations by letting the computer produce the first draft. Hence, the MT system must be designed to produce raw translations good enough so that professional revisors will accept to postedit them, and that overall costs and delays are reduced. That is possible only if the system is specialized to texts of a certain style and domain (``suboptimization approach'' in L. Bourbeau's terminology [BCG90,LB88]). Political, scientific and industrial decision makers, as well as the public at large, often envisage that arrangement (pure MT followed by postedition) as the only possible.

About twenty systems are now commercially available. About fifteen of them are Japanese (AS-Transac by Toshiba, ATLAS-II by Fujitsu, PIVOT by NEC, HICAT by Hitachi, SHALT-J by IBM-Japan, PENSÉ by OKI, DUET by Sharp, MAJESTIC by JICST, etc.), and handle almost exclusively the language pairs Japanese / English. Other systems come from the U.S. (LOGOS, METAL, SPANAM), France (Ariane/aéro/F-E by SITE-B'VITAL, based on GETA's computer tools and linguistic methodology), or Germany (SUSY by IAI in Saarbruecken), and center on English, German or French, although mockups and prototypes exist for many other languages. Still others are large and operational, but not (yet ?) commercially offered (JETS by IBM-Japan, LMT by IBM-US, ALT/JE by NTT, etc.).

What can be expected from these systems? Essentially, to answer growing needs in technical translation. In the average, a 250-word page is translated in 1 hour and revised in 20 min. Hence, 4 persons produce a finished translation at a rate of 3 pages per hour (p/h). Ideally, then, some translators could become revisors and 6 persons should produce 12 p/h. As it is, that is only an upper limit, and a more realistic figure is 8 p/h, if one counts a heavier revision rate of 30 mn/p (after adequate training). Several users report overall gains of 40 to 50%. An extreme case is the METEO system [Cha89], which is so specialized that it can produce very high quality raw translations, needing only 3 text processor operations per 100 words translated. Another way of looking at the economics of MT is in terms of human effort: according to figures given by producers of MT systems [JEI89], the creation of a new (operational) system from scratch costs between 200 and 300 man-years with highly specialized developers. Also, the cost to adapt an existing system to a new domain and a new typology of texts is in the order of 5 to 10 man-years, which makes it impractical for less than 10,000 pages to translate. All counted, the breakeven point lies between 9,000 and 10,000 pages, an already large amount.

This approach, then, is at present only envisageable for large flows of homogeneous and computerized texts, such as user or maintenance manuals. An essential condition of success is that the team in charge of developing and maintaining the lingware (dictionaries, grammars) be in constant touch with the revisors, and if possible with the authors of the documents to be translated. A good example in this respect is Pan American Health Organization (PAHO) [VL88], with its systems ENGSPAN and SPANAM.

Users should consider this kind of MT systems in the same way they consider expert systems. Expert systems can be developed by third parties, but it is essential for users to master them in order to let them evolve satisfactorily and to use them best.

As the MT systems designed for diffusion purposes are computationally very heavy, they have been developed on mainframes. The situation is changing rapidly, however. Since powerful PCs are becoming widely available, they are now replacing terminals. Although many vendors offer specialized editors, on terminals or on PCs, there is a trend to let revisors work directly with their favorite text processor (such as Word, WordPerfect, WordStar, FrameMaker, Interleaf, Ventura, etc.) and to add specific functionalities as tools (such as Mercury/Termex or WinTool). But this technique is not yet able to offer all functionalities of specialized editors (such as showing corresponding source and target phrases in inverse video, or doing linguistic alignment, etc.). For example, the METAL system commercialized by Siemens runs on a LISP machine, while revision is done on a kind of PC. It seems also that the ATLAS II, PIVOT, and HICAT systems are still running on mainframes when used in house for the translation of technical documentation, or out house by translation offices submitting possibly preedited material. In France, SITE-B'Vital has ported the Ariane-G5 MT system generator (not yet the development environment) on Unix-based workstations, but the current use is from a PC under Word accessing an MT server running on an IBM 9221 minicomputer. Finally, there is now a commercial offer for diffusion MT systems on workstations (Toshiba, Sharp, Fujitsu, Nec). About 3,000 machines in total had been sold in Japan by April 1992. Systems used for diffusion MT are characterized, of course, by their specialization for certain kinds of texts (grammatical heuristics, terminological lexicons), but also by the richness of the tools they offer for preediting, postediting and stylistic system control (that is possible because intended users are bilingual specialists). They all include facilities to build terminological user dictionaries.

8.3.2 Four Main Situations in the Future

We anticipate that users of MT systems will increasingly be non-professionals, that is occasional translators or monolingual readers. According to the linguistic competence of the user and to whether he works in a team or alone, we envisage four types of situations in the middle term future, say, by the year 2000.

Individual Screening Translation Workstations:

Servers should continue to coexist with integrated solutions on PCs or workstations. Servers look appropriate for all situations where the same information is likely to be required by many persons, and is already available in computer-readable form (textual data bases, flow of short lived messages such as weather bulletins or stock exchange notices, computerized libraries, etc.). Translation may be performed once, possibly in advance, and some amount of quick revision may even be performed. It is also possible to analyze the text typology and to use corresponding specialized versions of the MT system. Large-spectrum systems will no doubt be ported to the more powerful PCs which will soon be available.

In each case, we can expect environments to be generic. The only difference between the two solutions will be the required computer power. For accessing a server, basic PCs already suffice. But running MT systems requires more power, simply because small improvements in output quality and ergonomy will continue to require a lot of computational resources, and because the basic software tools are also continuously requiring more computer resources.

Occasional Translation:

Current tools will no doubt be improved, in terms of speed, ergonomy and functionalities. As far as ergonomy is concerned, we envisage that the translator's aids will work in background and continuously offer help in windows associated with windows of the current application (text processor, spreadsheet, etc.). This begins to be possible, at least on Macintoshes, where different applications can communicate.

New functionalities should include more aids concerning the target language, in particular paraphrasing facilities and better tools for checking spelling, terminology, grammar, and style. They may even include some MT helps, not aiming at translating whole paragraphs or sentences, but rather at proposing translations for simple fragments, perhaps in several grammatical forms that seem possible in the context (case, number, person, time, etc.).

Individual Professional Translation:

It can be envisaged that free lance translators will make increasing use of communication facilities, to retrieve terminology, to communicate with authors, or to submit parts of their workload to some MT system. Perhaps they will even have computer tools to help them determine which MT system accessible over the network would be most suitable for the text currently at hand, if any. Current research in example-based MT will perhaps lead to much better tools for accessing previous translations of similar passages. As far as hardware is concerned, professional free lance translators should increasingly equip themselves with comfortable, but not too expensive configurations, such as middle-range PCs with large screens, CD-ROMs, and lots of disk space.

Industrial Professional Translation:

Industrial translation aims at a very high quality of fairly long documents. That is why the raw translation job (first draft) is usually divided among several translators, and why there is often more than one revision step. If MT is introduced, the revision job still has to be divided among several persons. There is a need for managing this collective effort. Hence, we can anticipate that this kind of translation will be organized around a local network, each translator/revisor working on a powerful PC, and accessing one or more MT servers, a terminology server, an example server (access to available parallel texts), etc., all being controlled by a senior translator using reserved managing facilities on his PC.

8.3.3 Future Directions

From the four types of users (screener, occasional translator, free lance translator, industrial translator), only the first and fourth can already use existing MT technology in a cost-effective way. The third will probably also be able to use it by the year 2000. But there is still a fifth possibility, which is now at the research stage, that of MT for monolingual writers, or personal MT. See e.g., [Boi86,BB93,CHH87,Hua90,MWO90,Sad89,STJ90,Tom86,Weh92,WWC86,WC88].

There is actually a growing need to translate masses of documents, notes, letters, etc., in several languages, especially in the global market. People are very conscious that they waste a lot of time and precision when they read or write texts in another language, even if they master it quite well. To take one language like English as the unique language of communication is not cost-effective. There is a strong desire to use one's own language, while of course trying to learn a few others for personal communication and cultural enrichment.

The idea behind this new kind of MT system is that users will accept to spend a lot of time interacting with the machine to get their texts translated into one or more languages, with a guaranteed high quality of the raw output. Engineers or researchers accustomed to painfully (try to) translate their prose into a foreign language (very often English, of course) would perhaps prefer to spend about the same time in such interaction, that is 60 to 90 mn per page, and get their text translated into all the languages of their correspondents. The system would negotiate the text with the author, in order to normalize it according to changeable parameters (style, terminology, etc.), and get a correct abstract representation of it (a so-called deep or intermediate structure) by asking questions to remove all ambiguities. Then, current technology could be applied to produce quality texts, needing no revision as far as grammaticality is concerned (the content is guaranteed to be correct because of the indirect preedition performed by the author himself, but the form and style would certainly be improvable).

This is of course another version of the old idea of interactive translation, proposed time and again since the first experiments by Kay and Kaplan in the sixties at the Rand Corporation (MIND system, [Kay73]). We attribute the relative failure of this approach to the fact that the user felt a slave of the machine, that the texts were supposed to be sacred, unchangeable, and that the questions asked were at the same time very specialized and quite unsettling. We hope that the time is now ripe for yet another attempt, using latest advances in ergonomy, AI methods for designing intelligent dialogues, and improved linguistic technology. One of the most challenging aspects of that approach is actually the need to express very sophisticated linguistic notions (such as modality, aspect, etc.) in a way understandable by users with no particular training in linguistics or translatology, and no knowledge of the target language(s). Some computer firms are already working on that concept, and may propose products well before the year 2000. But it will be a long time until it is possible to buy off-the-shelf multilingual systems of that kind, because of the tremendous amount of lexical and grammatical variety which is necessary if one does not want to restrict the domain and typology.

It will of course be possible to put a whole system of that kind on a very powerful PC. But an essential ingredient of success, we think, is that the user be never forced to wait, or to answer a question before being allowed to proceed with what he is doing. In other words, the system should simply tell (or better show) that there are some questions waiting to be answered before translation can proceed on some fragments of the text (or hypertext). Then, an attractive solution is to use a comparatively cheap PC as workstation, with a periodic connexion to an MT server (exactly as is done nowadays by e-mail environments).



next up previous contents
Next: 8.4 Machine-aided Human Translation Up: 8 Multilinguality Previous: 8.2 Machine Translation: