Postscript Version

Ellipsis Resolution in English

Daniel Hardt

Department of Computing Science
Villanova University
Villanova, PA 19085-1699

CONTACT INFORMATION

Phone: (610) 519-7337
Fax : (610) 519-7889 (fax)
Email: hardt@vill.edu

WWW PAGE

http://www.csc.vill.edu/~hardt

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Corpus, Ellipsis, Discourse, NLP

PROJECT SUMMARY

The objective of this research is to develop a system that reliably resolves certain forms of ellipsis in English. The system accepts syntactically annotated input, and produces output in which elliptical expressions are resolved, either by producing a non-elliptical paraphrase or by linking the elliptical expression to its antecedent. The system is being developed using the Penn Treebank, a syntactically annotated corpus of several million words, containing a wide range of texts of varying styles. Three different success criteria have been devised for the purpose of evaluating the system: head overlap, head match, and exact match, where comparison is with the choices of human coders. For VP Ellipsis, the system performance exceeds 90% using the head overlap criterion, which arguably provides the best picture of the system effectiveness. The system is currently being extended to other forms of ellipsis, such as stripping and sluicing. The system would represent a practical solution to a problem that confronts virtually any Natural Language Processing application that attempts to process English in a realistic setting. In addition, the system could be used as a tool in further annotating treebanks with ellipsis resolution information. Finally, the project has resulted in a large amount of valuable data for theoreticians studying ellipsis and related phenomena.

PROJECT REFERENCES

Nicholas Asher, Daniel Hardt, and Joan Busquets. 1997. Discourse Parallelism, Scope, and Ellipsis . Proceedings of the Seventh Conference on Semantics and Linguistic Theory. Palo Alto, CA

Matthew Stone and Daniel Hardt. 1997. Dynamic Discourse Referents for Tense and Modals . Proceedings of the Second International Workshop on Computational Semantics Tilburg, Netherlands.

Daniel Hardt. 1996. Centering in Dynamic Semantics. Proceedings of the Seventeenth International Conference on Computational Linguistics. Copenhagen, Denmark.

Daniel Hardt. 1997. An Empirical Approach to VP Ellipsis submitted for publication

Daniel Hardt. 1996a. Dynamic Interpretation of VP Ellipsis submitted for publication

Daniel Hardt. 1995. An Empirical Approach to VP Ellipsis. AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation.

Daniel Hardt. 1994. Sense and Reference in Dynamic Semantics. Proceedings of the Ninth Amsterdam Colloquium. Amsterdam, Netherlands.

Daniel Hardt. 1993. VP Ellipsis: Form, Meaning, and Processing. Ph.D. Dissertation. University of Pennsylvania.

Daniel Hardt. 1992. VP Ellipsis and Contextual Interpretation. Proceedings of the Fifteenth International Conference on Computational Linguistics. Nantes, France.

Daniel Hardt. 1992a. An Algorithm for VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.

Daniel Hardt. 1992b. Some Problematic Cases of VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.

Daniel Hardt. 1992c. VP Ellipsis and Semantic Identity. Proceedings of the Second Conference on Semantics and Linguistic Theory. Edited by Chris Barker and David Dowty. Columbus, OH.

Daniel Hardt. 1991. A Discourse Model Approach to VP Ellipsis. Proceedings AAAI Symposium on Discourse Structure in Natural Language Understanding and Generation. Asilomar, CA.

Daniel Hardt. 1991a. Towards a Discourse Level Account of VP Ellipsis. Proceedings of the 8th Eastern States Conference on Linguistics. G. Westphal, J. Dai and B. Ao (editors). Ohio State University.

AREA BACKGROUND

Elliptical expressions are a pervasive feature of ordinary English usage, and thus constitute a practical problem for any NLP system that would process English in a realistic setting. There has been a great deal of theoretical work investigating the underlying representations and mechanisms in a range of elliptical constructions (see Hardt 96a, Hardt 93, and references cited there) . The current project is the first empirically oriented investigations of elliptical phenomena, relying on the syntactically annotated Penn Treebank to develop and test a system for resolving elliptical forms. An earlier manual approach to this problem is described in Hardt 92. Similar empirically-oriented work has been pursued with respect to the problem of pronoun resolution (see for example Hobbs 78, Walker 89, Lappin and Leass 94, among many others).

AREA REFERENCES

Daniel Hardt. 1997. An Empirical Approach to VP Ellipsis submitted for publication

Daniel Hardt. 1996. Centering in Dynamic Semantics . COLING 96.(compressed postscript)

Daniel Hardt. 1996a. Dynamic Interpretation of VP Ellipsis submitted for publication

Daniel Hardt. 1993. VP Ellipsis: Form, Meaning, and Processing Ph.D. Dissertation. University of Pennsylvania.

Daniel Hardt. 1992. An Algorithm for VP Ellipsis. Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Newark, DE.

Jerry Hobbs. 1978. Resolving pronoun references. Lingua, 44:311--338.

Shalom Lappin and Herbert J. Leass. 1994. An algorithm for pronominal anaphora resolution. Computational Linguistics.

Marilyn Walker. 1989. Evaluating discourse processing algorithms. In Proceedings, 27th Annual Meeting of the ACL, Vancouver, Canada.

RELATED PROGRAM AREAS

4. Adaptive Human Interfaces. 5. Usability and User-Centered Design. 6. Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

The proposed project will result in a system that resolves a variety of well-defined forms of ellipsis in English. This might be of use in the area "Usability and User-Centered Design". It is often more natural for humans to use elliptical or reduced forms of input. The proposed system could be used to make this possible, if a basic syntactic structure for the input is provided. This could be done either through the use of a broad-coverage parser for English, or by restricting the input language to a parsable fragment of English. In either case, the possibility of elliptical input might enhance the "cognitive ergonomics" of the system. These issues might also be relevant for the areas of "Adaptive Human Interfaces" and "Intelligent Interactive Systems for Persons with Disabilities".