John Bateman
GMD, IPSI, Darmstadt, Germany
Although crucial to the entire enterprise of automatic text generation, deep generation remains a collection of activities lacking a clear theoretical foundation at this time. The most widely accepted views on what constitutes deep generation are already exhausted by a small number of techniques, resources and algorithms revealing as many problems as they can really claim to solve. For these reasons, recent research work in text generation centers on aspects of deep generation and it is here that serious breakthroughs are most needed. Whereas the goal of deep generation is to produce specifications of sufficiently fine granularity and degree of linguistic abstraction to drive surface generators, how it is to do so, and from what starting point, remains unclear.
Although deep generation is most often seen as notionally involving two subtasks---selecting the content for a text and imposing an appropriate linear order on that content's expression---it is now usually accepted that this decomposition is problematic. The subtasks are sufficiently interdependent as to make such a decomposition questionable. Linear order is achieved by the intermediate step of constructing a recursive text structure, typically the province of text planning. The two standard methods for constructing text structure, text schemata (e.g., [McK85,McC86,RK92,Par93b]) and rhetorical structuring (e.g., [MT87,Hov93,MP93]), both combine content selection and textual organization.
Text schemata describe text on the model of constituency. A text is defined in terms of a macro structure with constituents given by rhetorical predicates, such as Identification, Constituency, and Analogy. Individual rhetorical predicates generally include both constraints on the information they express and particular surface realization constraints. Rhetorical predicates are combined in fixed configurations, the text schemata. The most commonly cited problems with text schemata are their rigidity and lack of intentional information (cf. [MP93]): i.e., if an identification predicate appears, there is no record as to why a speaker has selected this predicate. This is particularly problematic for dialogue situations where breakdowns can occur. Despite these problems, however, schemata are still sometimes selected on the basis of their simplicity and ease of definition (cf. [RK92]).
In contrast to text schemata, rhetorical structures define the relational structure of a text. They show how a text can be recursively decomposed into smaller segments. These component segments are related to one another by means of a small set of rhetorical relations, such as elaboration, solutionhood, volitional cause, etc. Each such rhetorical relation is defined in terms of a distinctive set of constraints on the information presented in the segments related and in those segments' combination, on the speaker/hearer belief states, and on the effect that the speaker is attempting to achieve with the relation. It is generally assumed that imposing a rhetorical organization enables the information to be presented to be segmented into sufficiently small-scale chunks as to admit expression by surface generators. Rhetorical organization is typically constructed by using a top-down goal-oriented planning strategy with the rhetorical relation definitions as plan operators. However, while earlier rhetorical structure approaches tended to equate rhetorical relations with discourse intentions, this does not appear equally appropriate for all rhetorical relations. Those relations that are based on the informational content of the segments related underconstrain possible discourse intentions; for example, a circumstance relation can be given for many distinct discourse purposes. The most well developed versions of rhetorical structure-based text planning therefore separate out at least discourse intentions and rhetorical relations and allow a many-to-many relation between them, as defined by the system's planing operators.
An example of such a plan operator from the system of
[MP93] is the following:
The successful application of this operator has the effect that a state of the hearer being persuaded (a discourse intention) to do some act is achieved. The operator may be applied when the specified constraints hold. When this is the case, a rhetorical structuring involving motivation is constructed. Information selection is thus achieved as a side-effect of binding variables in the operator's constraints. Further such plan operators then decompose the rhetorical relation motivation until sequences of surface speech acts are reached. The Moore and Paris system contains approximately 150 such plan operators and is considered sufficiently stable for use in various application systems.
Particular text schemata are associated with specific communicative intentions (such as answering a specified user-question or constructing a specified text-type) directly. Rhetorical relations are included as the possible expansions of plan operators with communicative intentions as their effects. The intentions employed are typically defined by an application system or a research interest---for example, [Sut91] presents a useful set for generating pedagogically adequate explanations, others [McK85,RML92] adopt sets of possible responses to questions addressed to databases. The lack of clear definitions for what is to be accepted as an intention constitutes a substantial theoretical problem.
Whereas text schemata, which are now generally interpreted as pre-compiled plan sequences, and rhetorical structuring impose text structure on information, there are cases where it is argued that it is better for the information to be expressed to impose its structure more freely on text. Such data-driven approaches (cf. [Hov88b,KKR91,Sut91,Met91,McD92]), allow an improved opportunistic response to the contingencies of particular generation situations. Data-driven critics can be combined with the top-down planning of rhetorical structures in order to improve structures according to aggregation rules [Hov93] or text heuristics [SdS90]. A variation on data-driven content selection is offered by allowing transformation of the information itself, by means of logical inference rules defined over the knowledge base (e.g., [Hor90]).
Finally, a further active area of research is the addition of dynamic
constraints on the construction of rhetorical structures. Two examples
of such constraints are the use of focus
[MC91] and the use of thematic development [HLM
92] to direct selection among
alternative rhetorical organizations.
Although an increasing number of systems find the use of rhetorical relations, augmented in the ways described above, an effective means of planning text, unclarities in the definitions of rhetorical relations and weaknesses in their processing schemes result in some inherent limitations. These limitations are often hidden in specific contexts of use by hardwiring decisions and constraints that would in the general case need to be explicitly represented as linguistic resources and decisions. Success in the particular case should therefore always be re-considered in terms of the cost of re-use.
The selection of appropriate granularities for the presentation of information remains an unsolved problem. Information will be packaged into units depending on contingencies of that information's structure, on the text purpose, on the expected audience, on the writer's biases, etc. This general aggregation problem requires solutions that go beyond specific heuristics.
Also problematic is the assumption that a rhetorical structure can decompose a text down to the granularity of inputs required for surface generators. Current systems impose more or less ad hoc mappings from the smallest segments of the rhetorical structure to their realizations in clauses. Much fine-scaled text flexibility is thus sacrificed (cf. [Met91]); this also reduces the multilingual effectiveness of such accounts.
Finally, algorithms for deep generation remain in a very early stage of development. It is clear that top-down planning is not sufficient. The interdependencies between many disparate kinds of information suggest the application of constraint-resolution techniques [PM91] (as shown in the example plan operator given above) , but this has not yet been carried out for substantial deep generation components. The kinds of inferences typically supported in deep generation components are also limited, and so more powerful inference techniques (e.g., abduction [LO92]; decompositional, causal-link planning [YMP94]) may be appropriate.
Computational components responsible for deep generation are still most often shaped by their concrete contexts of use, rather than by established theoretical principles. The principal problem of deep generation is thus one of uncovering the nature of the necessary decisions underlying textual presentation and of organizing the space of such decisions appropriately. It is crucial that methodologies and theoretical principles be developed for this kind of linguistic description.
Furthermore, current work on more sophisticated inferencing capabilities need to be brought to bear on deep generation. Important here, however, is to ensure that this is done with respect to sufficiently complex sources of linguistic constraint. Approaches rooted in mainstream (computational) linguistics posit fewer linguistic constraints in favour of more powerful inferencing over common sense knowledge. [Shi93], for example, divides generation generally into the generator (i.e., surface generator: mapping semantics to syntax) and the reasoner (the rest: pragmatics), whereby inferences are allowed to blend into common sense reasoning. This leaves no theoretically well-specified space of linguistic decisions separate to general inferential capabilities. The consequences of this for generation are serious; it is essential that more structured sources of constraint are made available if generation is to succeed.
Very rich, but computationally underspecified, proposals in this area can be found in functional approaches to language and text (cf. [Mar92]); results here suggest that the space of linguistic text organizational decisions is highly complex---similar to the kind of complexity found within grammars and lexicons. One methodology to improve the status of such accounts is then to use the control requirements of grammars and semantics as indications of the kinds of distinctions that are required at deeper, more abstract level of organization (cf. [Mat87,Bat91,McD93]). The richer the grammatical and semantic starting points taken here, the more detailed hypotheses concerning those deeper levels become. This then offers an important augmentation of the informationally weak approaches from structural linguistics. Sophisticated inferential capabilities combined with strong sources of theoretically motivated linguistic constraints appear to offer the most promising research direction. This is also perhaps the only way to obtain an appropriate balance between fine detail and generality in the linguistic knowledge proposed. New work in this area includes that of the ESPRIT Basic Research Action DANDELIONDANDELION (EP6665).
A further key problem is the availability of appropriately organized knowledge representations. Although in research the generation system and the application system are sometimes combined, this cannot be assumed to be the case in general. The information selected for presentation will therefore be drawn from a representational level which may or may not have some linguistically relevant structuring, depending on the application or generation system architecture involved. This information must then be construed in terms that can be related to some appropriate linguistic expression and, as [McD94] points out with respect to application systems providing only raw numerical data, this latter step can be a difficult one in its own right. More general techniques for relating knowledge and generation intentions can only be provided if knowledge representation is guided more by the requirements of natural language. It is difficult for a knowledge engineer to appreciate just how inadequate a domain model that is constructed independently of natural language considerations---although possibly highly elegant and inferentially-adequate for some application---typically reveals itself when natural language generation is required (cf. [Nov91]). If text generation is required, it is necessary for this to be considered at the outset in the design of any knowledge-based system; otherwise expensive redesign or limited text generation capabilities will be unavoidable.