Postscript Version

A Dictionary of Nominal Complements

Ralph Grishman
Catherine Macleod

Dept. of Computer Science
New York University

CONTACT INFORMATION

715 Broadway, 7th Floor
New York, NY 10003
Phone: (212) 998-3497 (Grishman) -3491 (Macleod)
Fax : (212) 995-4123
Email: grishman@cs.nyu.edu, macleod@cs.nyu.edu

WWW PAGE

http://cs.nyu.edu/cs/projects/proteus

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

natural language, English, dictionaries, lexicons, nominalizations, complements

PROJECT SUMMARY

Central to the task of understanding a natural language text is determining its predicate-argument structure --- the "who did what to whom" information about the text. Determining this information requires knowledge about the argument structure of the words in the text. For example, one would need to know that "give" can appear with three arguments, and these may be realized as "x gave y to z" or "x gave z y". To process everyday English text, one needs this information about thousands of English words.

Several broad-coverage resources of this type are available for English verbs. Several commercial dictionaries, including Longman's Dictionary of Contemporary English and the Oxford Advanced Learner's Dictionary, provide quite detailed information and have been used for natural language analysis. Comlex Syntax, developed at NYU specifically for natural language processing, provides even more detailed information about the syntax of verbs.

There is no comparably rich resource for noun arguments. Under this grant, NYU is developing a design for such a dictionary, and creating entries for approximately 1600 commonly occurring nouns which take arguments. For each noun, the entry describes the syntactic structure of its arguments (its "complement structure"). This includes such structures as

a history of erratic behavior
a lecture on how to get rich
the absolution of the sinner by the priest
the acquisition of Nabisco for $3 billion

In addition, for nouns derived from verbs or adjectives ("nominalizations"), the dictionary lists correspondences between the arguments. This will allow a text processing program to relate a nominalized form, such as

the acquisition of Lotus by IBM for $500 million
to a verbal form
IBM acquired Lotus for $500 million

In our first year we have developed a design for Nomlex entries and have adapted our menu-based entry program (initially created for Comlex) to the new entry structure. This program provides access to a large text concordance and allows the coders to capture relevant citations during the entry process. Using this tool, two linguistics graduate students have created a pilot dictionary with a few hundred entries for nominalized verbs.

PROJECT REFERENCES

Ralph Grishman, Catherine Macleod, and Adam Meyers, Comlex Syntax: Building A Computational Lexicon. Proc. COLING 94 (Int'l Conference on Computational Linguistics), Kyoto, Japan, August 1994.

Catherine Macleod, Adam Meyers, and Ralph Grishman. The Influence of Tagging on the Classification of Lexical Complements. Proc. 16th Int'l Conf. on Computational Linguistics (COLING-96), Copenhagen, August 1996, pp. 472-477.

Adam Meyers, Catherine Macleod, and Ralph Grishman. Standardization of the Complement/Adjunct Distinction. Proc. EURALEX 96 (Int'l Conf. on Lexicography), Gothenberg, Sweden, August, 1996.

AREA BACKGROUND

Natural language provides the most natural vehicle for communicating complex ideas between people and machines, in cases where pointing or menu selection is not sufficient. Natural language provides great flexibility in referring to things which are not on the screen, to the past and future, and to relationships between objects. However, there have been (at least) two serious hurdles to the wider use of natural language for interactive systems. First, people don't like to type, so effective natural language interaction requires speech understanding; improvements in speech recognition and in the performance of personal computers are reducing this hurdle. Second, natural language is by its very nature unconstrained, so people can formulate the same request to the computer in many different ways. The challenge for natural language processing to be able to successfully cope with this wide variety of expression.

Effective natural language processing (NLP) requires a combination of a rich variety of knowledge sources (knowledge of lexical items, syntax, semantic structure, and discourse structure) and flexible, robust algorithms which can use this knowledge to determine the structure and meaning of an utterance. Research in NLP has been aided by the increased availability of machine-readable text corpora and linguistically-annotated corpora, including corpora which can be used for system evaluation. Progress has been achieved in part through the use of simple analysis algorithms, such as finite state methods and probabilistic methods. In particular, significant gains have been made through the use of trainable analysis algorithms, which can benefit from the large corpora now available.

AREA REFERENCES

James Allen, Natural Language Understanding. Benjamin Cummings.

Eugene Charniak, Statistical Language Learning. MIT Press, 1993.

Ralph Grishman, Computational Linguistics: An Introduction. Cambridge University Press, 1986.