Postscript Version
Ellipsis Resolution in English
Daniel Hardt
Department of Computing Science
Villanova University
Villanova, PA 19085-1699
CONTACT INFORMATION
Phone: (610) 519-7337
Fax : (610) 519-7889 (fax)
Email: hardt@vill.edu
WWW PAGE
http://www.csc.vill.edu/~hardt
PROGRAM AREA
Speech and Natural Language Understanding.
KEYWORDS
Corpus, Ellipsis, Discourse, NLP
PROJECT SUMMARY
The objective of this research is to develop a system
that reliably resolves certain forms of ellipsis in English.
The system accepts syntactically annotated input, and produces
output in which elliptical expressions are resolved, either by
producing a non-elliptical paraphrase or by linking the elliptical
expression to its antecedent.
The system is being developed using
the Penn Treebank, a syntactically annotated corpus of several million
words, containing a wide range of texts of varying styles.
Three different success criteria have been devised for
the purpose of evaluating the system: head overlap, head
match, and exact match, where comparison is with the
choices of human coders. For VP Ellipsis, the system
performance exceeds 90% using the head overlap criterion,
which arguably provides the best picture of the system
effectiveness. The system is currently being extended
to other forms of ellipsis, such as stripping and sluicing.
The system would represent a practical
solution to a problem that confronts virtually any Natural
Language Processing application that attempts to process
English in a realistic setting. In addition,
the system could be used as a tool in further annotating treebanks
with ellipsis resolution information. Finally, the project has
resulted in a large amount of valuable data for theoreticians
studying ellipsis and related phenomena.
PROJECT REFERENCES
Nicholas Asher, Daniel Hardt, and Joan Busquets. 1997.
Discourse Parallelism, Scope, and
Ellipsis
. Proceedings of the Seventh Conference on Semantics and
Linguistic Theory. Palo Alto, CA
Matthew Stone and Daniel Hardt. 1997.
Dynamic
Discourse Referents for Tense and Modals
.
Proceedings of the Second International Workshop on
Computational Semantics Tilburg, Netherlands.
Daniel Hardt. 1996. Centering in Dynamic Semantics. Proceedings
of the Seventeenth International Conference on
Computational Linguistics. Copenhagen, Denmark.
Daniel Hardt. 1997. An Empirical
Approach
to VP Ellipsis
submitted for publication
Daniel Hardt. 1996a. Dynamic
Interpretation of VP Ellipsis
submitted for publication
Daniel Hardt. 1995. An Empirical Approach to VP Ellipsis.
AAAI Spring Symposium on Empirical Methods in Discourse
Interpretation and Generation.
Daniel Hardt. 1994. Sense and Reference in Dynamic Semantics.
Proceedings of the Ninth Amsterdam Colloquium. Amsterdam,
Netherlands.
Daniel Hardt. 1993. VP
Ellipsis: Form, Meaning, and Processing.
Ph.D. Dissertation. University of Pennsylvania.
Daniel Hardt. 1992. VP Ellipsis and Contextual Interpretation.
Proceedings of the Fifteenth International Conference on
Computational
Linguistics. Nantes, France.
Daniel Hardt. 1992a. An Algorithm
for VP Ellipsis.
Proceedings, 29th Annual Meeting of the Association for Computational
Linguistics. Newark, DE.
Daniel Hardt. 1992b. Some Problematic Cases of VP Ellipsis.
Proceedings, 29th Annual Meeting of the Association for Computational
Linguistics. Newark, DE.
Daniel Hardt. 1992c. VP Ellipsis and Semantic Identity.
Proceedings of the Second Conference on Semantics and Linguistic
Theory.
Edited by Chris Barker and David Dowty. Columbus, OH.
Daniel Hardt. 1991. A Discourse Model Approach to VP Ellipsis.
Proceedings AAAI Symposium on Discourse Structure in Natural Language
Understanding and Generation. Asilomar, CA.
Daniel Hardt. 1991a. Towards a Discourse Level Account of VP Ellipsis.
Proceedings of the 8th Eastern States Conference on Linguistics.
G. Westphal, J. Dai and B. Ao (editors). Ohio State University.
AREA BACKGROUND
Elliptical expressions are a pervasive feature of ordinary
English usage, and thus constitute a practical problem for
any NLP system that would process English in a realistic
setting. There has been a great deal of theoretical work
investigating the underlying representations and mechanisms
in a range of elliptical constructions (see Hardt 96a, Hardt 93,
and references cited there) . The current project
is the first empirically oriented investigations of elliptical
phenomena, relying on the
syntactically annotated Penn Treebank to develop and test a system
for resolving elliptical forms. An earlier manual approach
to this problem is described in Hardt 92. Similar
empirically-oriented
work has been pursued with respect to the problem of pronoun
resolution (see for example Hobbs 78, Walker 89, Lappin and Leass 94,
among
many others).
AREA REFERENCES
Daniel Hardt. 1997. An Empirical
Approach
to VP Ellipsis
submitted for publication
Daniel Hardt. 1996. Centering
in Dynamic Semantics
. COLING 96.(compressed postscript)
Daniel Hardt. 1996a. Dynamic
Interpretation of VP Ellipsis
submitted for publication
Daniel Hardt. 1993. VP
Ellipsis: Form, Meaning, and Processing
Ph.D. Dissertation. University of Pennsylvania.
Daniel Hardt. 1992. An Algorithm
for VP Ellipsis.
Proceedings, 29th Annual Meeting of the Association for Computational
Linguistics. Newark, DE.
Jerry Hobbs. 1978. Resolving pronoun references. Lingua,
44:311--338.
Shalom Lappin and Herbert J. Leass.
1994.
An algorithm for pronominal anaphora resolution.
Computational Linguistics.
Marilyn Walker. 1989.
Evaluating discourse processing algorithms.
In Proceedings, 27th Annual Meeting of the ACL, Vancouver,
Canada.
RELATED PROGRAM AREAS
4. Adaptive Human Interfaces.
5. Usability and User-Centered Design.
6. Intelligent Interactive Systems for Persons with Disabilities.
POTENTIAL RELATED PROJECTS
The proposed project will result in a system that resolves a variety
of well-defined forms of ellipsis in English. This might be
of use in the area "Usability and User-Centered Design". It is often
more natural for humans to use elliptical or reduced forms of
input. The proposed system could be used to make this possible, if
a basic syntactic structure for the input is provided. This could
be done either through the use of a broad-coverage parser for English,
or
by restricting the input language to a parsable fragment of English.
In either case, the possibility of elliptical input might enhance the
"cognitive ergonomics" of the system. These issues might also be
relevant for the areas of "Adaptive Human Interfaces" and
"Intelligent Interactive Systems for Persons with Disabilities".