next up previous contents index
Next: 3.2 Sub-Sentential Processing Up: 3 Language Analysis and Previous: 3 Language Analysis and

Chapter 3: Language Analysis and Understanding

3.1 Overview

Annie Zaenen & Hans Uszkoreit
Rank Xerox Research Centre, Grenoble, France
Deutsches Forschungszentrum für Künstliche Intelligenz
and Universität des Saarlandes, Saarbrücken, Germany

We understand larger textual units by combining our understanding of smaller ones. The main aim of linguistic theory is to show how these larger units of meaning arise out of the combination of the smaller ones. This is modeled by means of a grammar. Computational linguistics then tries to implement this process in an efficient way. It is traditional to subdivide the task into syntax and semantics, where syntax describes how the different formal elements of a textual unit, most often the sentence, can be combined and semantics describes how the interpretation is calculated.

In most language technology applications the encoded linguistic knowledge, i.e., the grammar, is separated from the processing components. The grammar consists of a lexicon, and rules that syntactically and semantically combine words and phrases into larger phrases and sentences. A variety of representation languages have been developed for the encoding of linguistic knowledge. Some of these languages are more geared towards conformity with formal linguistic theories, others are designed to facilitate certain processing models or specialized applications.

Several language technology products that are on the market today employ annotated phrase-structure grammars, grammars with several hundreds or thousands of rules describing different phrase types. Each of these rules is annotated by features and sometimes also by expressions in a programming language. When such grammars reach a certain size they become difficult to maintain, to extend and to reuse. The resulting systems might be sufficiently efficient for some applications but they lack the speed of processing needed for interactive systems (such as applications involving spoken input) or systems that have to process large volumes of texts (as in machine translation).

In current research, a certain polarization has taken place. Very simple grammar models are employed, e.g., different kinds of finite-state grammars that support highly efficient processing. Some approaches do away with grammars altogether and use statistical methods to find basic linguistic patterns. These approaches are discussed in section gif. On the other end of the scale, we find a variety of powerful linguistically sophisticated representation formalisms that facilitate grammar engineering. An exhaustive description of the current work in that area would be well beyond the scope of this overview. The most prevalent family of grammar formalisms currently used in computational linguistics, constraint based formalisms, is described in short in section gif. Approaches to lexicon construction inspired by the same view are described in section gif.

Recent developments in the formalization of semantics are discussed in section gif.

The computational issues related to different types of sentence grammars are discussed in section gif. Section gif evaluates how successful the different techniques are in providing robust parsing results, and section gif addresses issues raised when units smaller than sentences need to be parsed.



next up previous contents
Next: 3.2 Sub-Sentential Processing Up: 3 Language Analysis and Previous: 3 Language Analysis and