Dept. of Computer Science
New York University
Several broad-coverage resources of this type are available for English verbs. Several commercial dictionaries, including Longman's Dictionary of Contemporary English and the Oxford Advanced Learner's Dictionary, provide quite detailed information and have been used for natural language analysis. Comlex Syntax, developed at NYU specifically for natural language processing, provides even more detailed information about the syntax of verbs.
There is no comparably rich resource for noun arguments. Under this grant, NYU is developing a design for such a dictionary, and creating entries for approximately 1600 commonly occurring nouns which take arguments. For each noun, the entry describes the syntactic structure of its arguments (its "complement structure"). This includes such structures as
a history of erratic behavior
a lecture on how to get rich
the absolution of the sinner by the priest
the acquisition of Nabisco for $3 billion
In addition, for nouns derived from verbs or adjectives ("nominalizations"), the dictionary lists correspondences between the arguments. This will allow a text processing program to relate a nominalized form, such as
the acquisition of Lotus by IBM for $500 millionto a verbal form
IBM acquired Lotus for $500 million
In our first year we have developed a design for Nomlex entries and have adapted our menu-based entry program (initially created for Comlex) to the new entry structure. This program provides access to a large text concordance and allows the coders to capture relevant citations during the entry process. Using this tool, two linguistics graduate students have created a pilot dictionary with a few hundred entries for nominalized verbs.
Catherine Macleod, Adam Meyers, and Ralph Grishman. The Influence of Tagging on the Classification of Lexical Complements. Proc. 16th Int'l Conf. on Computational Linguistics (COLING-96), Copenhagen, August 1996, pp. 472-477.
Adam Meyers, Catherine Macleod, and Ralph Grishman. Standardization of the Complement/Adjunct Distinction. Proc. EURALEX 96 (Int'l Conf. on Lexicography), Gothenberg, Sweden, August, 1996.
Effective natural language processing (NLP) requires a combination of a rich variety of knowledge sources (knowledge of lexical items, syntax, semantic structure, and discourse structure) and flexible, robust algorithms which can use this knowledge to determine the structure and meaning of an utterance. Research in NLP has been aided by the increased availability of machine-readable text corpora and linguistically-annotated corpora, including corpora which can be used for system evaluation. Progress has been achieved in part through the use of simple analysis algorithms, such as finite state methods and probabilistic methods. In particular, significant gains have been made through the use of trainable analysis algorithms, which can benefit from the large corpora now available.
Eugene Charniak, Statistical Language Learning. MIT Press, 1993.
Ralph Grishman, Computational Linguistics: An Introduction. Cambridge University Press, 1986.