Chapter 4: Language Generation
Eduard Hovy
University of Southern California, Marina del Rey, California, USA
The area of study called natural language generation
(NLG)
investigates how computer programs can be made to produce high-quality
natural language text from computer-internal representations of
information. Motivations for this study range from entirely
theoretical (linguistic, psycholinguistic) to entirely practical (for the production of output systems for computer
programs). Useful overviews of the research are
[DHRS92,PSM90,Kem87,BH92,MS87,MBG
81]. The stages
of language generation for a given application, resulting in speech output, are
shown in Figure 4.1.
Figure 4.1: The stages of language generation.
This section discusses the following:
No field of study can be described adequately using a single perspective. In order to understand NLG it is helpful to consider independently the tasks of generation and the process of generation. Every generator addresses one or more tasks and embodies one (or sometimes two) types of process. One can identify three types of generator task: text planning, sentence planning, and surface realization. Text planners select from a knowledge pool what information to include in the output, and out of this create a text structure to ensure coherence. On a more local scale, sentence planners organize the content of each sentence, massaging and ordering its parts. Surface realizers convert sentence-sized chunks of representation into grammatically correct sentences. Generator processes can be classified into points on a range of sophistication and expressive power, starting with inflexible canned methods and ending with maximally flexible feature combination methods. For each point on this range, there may be various types of implemented algorithms. 18 The simplest approach, canned text systems, is used in the majority of software: the system simply prints a string of words without any change (error messages, warnings, letters, etc.). The approach can be used equally easily for single-sentence and for multi-sentence text generation. Trivial to create, the systems are very wasteful. Template systems, the next level of sophistication, are used as soon as a message must be produced several times with slight alterations. Form letters are a typical template application, in which a few open fields are filled in specified constrained ways. The template approach is used mainly for multisentence generation, particularly in applications whose texts are fairly regular in structure such as some business reports. The text planning components of the U.S. companies CoGenTex (Ithaca, NY) and Cognitive Systems Inc. (New Haven, CT) enjoy commercial use. On the research side, the early template-based generator ANA [Kuk83] produced stock market reports from a news wire by filling appropriate values into a report template. More sophisticated, the multisentence component of TEXT [McK85] could dynamically nest instances of four stereotypical paragraph templates called schemas to create paragraphs. TAILOR [Par93a] generalized TEXT by adding schemas and more sophisticated schema selection criteria.
Phrase-based systems employ what can be seen as
generalized templates, whether at the sentence level (in
which case the phrases resemble phrase structure grammar
rules) or at the
discourse level (in which case they are often called text plans). In such systems, a phrasal pattern is first selected to match the top
level of the input (say, [SUBJECT VERB OBJECT]), and then each
part of the pattern is expanded into a more specific phrasal pattern
that matches some subportion of the input (say, [DETERMINER ADJECTIVES HEAD-NOUN MODIFIERS]), and so on; the cascading process
stops when every phrasal pattern has been replaced by one or more
words. Phrase-based systems can be powerful and robust, but are very
hard to build beyond a certain size, because the phrasal
interrelationships must be carefully specified to prevent
inappropriate phrase expansions. The phrase-based approach has mostly
been used for single-sentence generation (since linguists' grammars
provide well-specified collections of phrase structure rules). A
sophisticated example is MUMBLE
[McD80,MMA
87], built at the University of Massachusetts,
Amherst. Over the past five years, however, phrase-based multisentence text structure generation (often called text planning) has received considerable attention in the research community, with the development of the RST text structurer [Hov88a], the EES text planner
[Moo89], and several similar systems
[Dal90,Caw89,Sut93], in which each
so-called text plan is a phrasal pattern that specifies the
structure of some portion of the discourse, and each portion of the
plan is successively refined by more specific plans until the
single-clause level is reached. Given the lack of understanding of
discourse structure and the paucity of the discourse plan libraries,
however, such planning systems do not yet operate beyond the
experimental level.
Feature-based systems represent, in a sense, the limit point of the generalization of phrases. In feature-based systems, each possible minimal alternative of expression is represented by a single feature; for example, a sentence is either POSITIVE or NEGATIVE, it is a QUESTION or an IMPERATIVE or a STATEMENT, its tense is PRESENT or PAST and so on. Each sentence is specified by a unique set of features. Generation proceeds by the incremental collection of features appropriate for each portion of the input (either by the traversal of a feature selection network or by unification), until the sentence is fully determined. Feature-based systems are among the most sophisticated generators built today. Their strength lies in the simplicity of their conception: any distinction in language is defined as a feature, analyzed, and added to the system. Their strength lies in the simplicity of their conception: any distinction in language can be added to the system as a feature. Their weakness lies in the difficulty of maintaining feature interrelationships and in the control of feature selection (the more features available, the more complex the input must be). No feature-based multisentence generators have been built to date. The most advanced single-sentence generators of this type include PENMAN [Mat83,MM85] and its descendant KPML [BMTW91], the Systemic generators developed at USC/ISI and IPSI; COMMUNAL [Faw92] a Systemic generator developed at Wales; the Functional Unification Grammar framework (FUF) [Elh92] from Columbia University; SUTRA [VHHJW80] developed at the University of Hamburg; SEMTEX [R86] developed at the University of Stuttgart; and POPEL [Rei91] developed at the University of the Saarland. The two generators most widely distributed, studied, and used are PENMAN/KPML and FUF. None of these systems is in commercial use.
It is safe to say that at the present time one can fairly easily build a single-purpose generator for any specific application, or with some difficulty adapt an existing sentence generator to the application, with acceptable results. However, one cannot yet build a general-purpose sentence generator or a non-toy text planner. Several significant problems remain without sufficiently general solutions:
Lexical Selection: Lexical selection is one of the most difficult problems in generation. At its simplest, this question involves selecting the most appropriate single word for a given unit of input. However, as soon as the semantic model approaches a realistic size, and as soon as the lexicon is large enough to permit alternative locutions, the problem becomes very complex. In some situation, one might have to choose among the phrases John's car, John's sports car, his speedster, the automobile, the red vehicle, the red Mazda for referring to a certain car. The decision depends on what has already been said, what is referentially available from context, what is most salient, what stylistic effect the speaker wishes to produce, and so on. A considerable amount of work has been devoted to this question, and solutions to various aspects of the problem have been suggested (see for example [Gol75,ER92,MRT93]). At this time no general methods exist to perform lexical selection. Most current generator systems simply finesse the problem by linking a single lexical item to each representation unit. What is required: Development of theories about and implementations of lexical selection algorithms, for reference to objects, event, states, etc., and tested with large lexica.
Discourse Structure: One of the most exciting recent research developments in generation is the automated planning of paragraph structure. The state of the art in discourse research is described in chapter 6. So far no text planner exists that can reliably plan texts of several paragraphs in general. What is required: Theories of the structural nature of discourse, of the development of theme and focus in discourse, and of coherence and cohesion; libraries of discourse relations, communicative goals, and text plans; implemented representational paradigms for characterizing stereotypical texts such as reports and business letters; implemented text planners that are tested in realistic non-toy domains.
Sentence Planning: Even assuming the text planning problem solved, a number of tasks remain before well-structured multisentence text can be generated. These tasks, required for planning the structure and content of each sentence, include: pronoun specification, theme signaling, focus signaling, content aggregation to remove unnecessary redundancies, the ordering of prepositional phrases, adjectives, etc. An elegant system that addressed some of these tasks is described in [App85]. While to the nonspecialist these tasks may seem relatively unimportant, they can have a significant effect and make the difference between a well-written and a poor text. What is required: Theories of pronoun use, theme and focus selection and signaling, and content aggregation; implemented sentence planners with rules that perform these operations; testing in realistic domains.
Domain Modeling: A significant shortcoming in generation research is the lack of large well-motivated application domain models, or even the absence of clear principles by which to build such models. A traditional problem with generators is that the inputs are frequently hand-crafted, or are built by some other system that uses representation elements from a fairly small hand-crafted domain model, making the generator's inputs already highly oriented toward the final language desired. It is very difficult to link a generation system to a knowledge base or database that was originally developed for some non-linguistic purpose. The mismatches between the representation schemes demonstrate the need for clearly articulated principles of linguistically appropriate domain modeling and representational adequacy (see also [Met90]). The use of high-level language-oriented concept taxonomies such as the Penman Upper Model [BMW90] to act as a bridge between the domain application's concept organization and that required for generation is becoming a popular (though partial) solution to this problem. What is required: Implemented large-size (over 10,000 concepts) domain models that are useful both for some non-linguistic application and for generation; criteria for evaluating the internal consistency of such models; theories on and practical experience in the linking of generators to such models; lexicons of commensurate size.
Generation Choice Criteria: Probably the problem least addressed in generator systems today is the one that will take the longest to solve. This is the problem of guiding the generation process through its choices when multiple options exist to handle any given input. It is unfortunately the case that language, with its almost infinite flexibility, demands far more from the input to a generator than can be represented today. As long as generators remain fairly small in their expressive potential then this problem does not arise. However, when generators start having the power of saying the same thing in many ways, additional control must be exercised in order to ensure that appropriate text is produced. As shown in [Hov88b] and [Jam87], different texts generated from the same input carry additional, non-semantic import; the stylistic variations serve to express significant interpersonal and situational meanings (text can be formal or informal, slanted or objective, colorful or dry, etc.). In order to ensure appropriate generation, the generator user has to specify not only the semantic content of the desired text, but also its pragmatic---interpersonal and situational---effects. Very little research has been performed on this question beyond a handful of small-scale pilot studies. What is required: Classifications of the types of reader characteristics and goals, the types of author goals, and the interpersonal and situational aspects that affect the form and content of language; theories of how these aspects affect the generation process; implemented rules and/or planning systems that guide generator systems' choices; criteria for evaluating appropriateness of generated text in specified communicative situations.
Infrastructure Requirements: The overarching challenge for generation is scaling up to the ability to handle real-world, complex domains. However, given the history of relatively little funding support, hardly any infrastructure required for generation research exists today.
The resources most needed to enable both high-quality research and large-scale generation include the following:
Longer-term Research Projects: Naturally, the number and variety of promising long-term research projects is large. The following directions have all been addressed by various researchers for over a decade and represent important strands of ongoing investigation:
Near- and Medium-term Applications with Payoff Potential: Taking into account the current state of the art and gaps in knowledge and capability, the following applications (presented in order of increasing difficulty) provide potential for near-term and medium-term payoff:
During the past two decades, language generation technology has developed to the point where it offers general-purpose single-sentence generation capability and limited-purpose multisentence paragraph planning capability. The possibilities for growth and development of useful applications are numerous and exciting. Focusing new research on specific applications and on infrastructure construction will help turn the promise of current text generator systems and theories into reality.