Department of Computer Science
Cornell University
Improving Minority Class Prediction Using Case-Specific Feature Weights. C. Cardie and N. Howe. Proceedings of the Fourteenth International Conference on Machine Learning, to appear.
Examining Locally Varying Weights for Nearest Neighbor Algorithms, N. Howe and C. Cardie. Proceedings of the Second International Conference on Case-Based Reasoning, to appear.
An Analysis of Statistical and Syntactic Phrases., M. Mitra, C. Buckley, A. Singhal, and C. Cardie. 5TH RIAO Conference, Computer-Assisted Information Searching On the Internet, to appear.
Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge. C. Cardie. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 113-126, University of Pennsylvania, 1996.
Embedded Machine Learning Systems for Natural Language Processing: A General Framework. C. Cardie. In Wermter, S. and Riloff, E. and Scheler, Gabriele (eds.), Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Lecture Notes in Artificial Intelligence, 315-328, Springer, 1996.
Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis, C. Cardie. Ph.D. Thesis, University of Massachusetts, Amherst, MA, 1994. Available as University of Massachusetts, CMPSCI Technical Report 94-74.
University of Massachusetts/Hughes: Description of the CIRCUS System as Used for MUC-5. W. Lehnert, J. McCarthy, S. Soderland, E. Riloff, C. Cardie, J. Peterson, and F. Feng; C. Dolan, and S. Goldman. Proceedings of the Fifth Message Understanding Conference (MUC-5), Baltimore, MD, Morgan Kaufmann, 1994.
A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis, C. Cardie. Proceedings of the Eleventh National Conference on Artificial Intelligence, 798-803, Washington, DC, AAAI Press / MIT Press, 1993.
Using Decision Trees to Improve Case-Based Learning, C. Cardie. Proceedings of the Tenth International Conference on Machine Learning, 25-32, Amherst, MA, Morgan Kaufmann, 1993.
Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics, C. Cardie. Proceedings of the 30th Annual Conference of the Association for Computational Linguistics, 216-223, Newark, DE, Association for Computational Linguistics, 1992.
Learning to Disambiguate Relative Pronouns, C. Cardie. Proceedings of the Tenth National Conference on Artificial Intelligence, 38-43, San Jose, CA, AAAI Press / MIT Press, 1992.
Proceedings of the Conferences on Empirical Methods in Natural Language Processing, 1996 and 1997. Available through the Association for Computational Linguistics (ACL).
Proceedings of the Workshops on Very Large Corpora, 1993-1997. Available through the Association for Computational Linguistics (ACL).
Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Wermter, S. and Riloff, E. and Scheler, Gabriele (eds.), Lecture Notes in Artificial Intelligence, Springer, 1996.
The state of affairs for many end-users of existing information retrieval (IR) systems, Web search engines, and natural language interfaces to document collections is far from optimal. In order to maintain general-purpose retrieval capabilities, for example, current IR systems attempt to balance performance with respect to precision and recall measures. In response to a user query, for example, the system will return as many useful documents as possible, intermingling useful documents with numerous non-useful documents. Oftentimes, however, users would prefer to see a small set of documents, all of which are deemed useful. This scenario requires a retrieval mechanism that emphasizes precision over recall. Unfortunately, the frustration of end-users does not end once a relevant document is found: existing text retrieval systems provide only the simplest methods for browsing the document (e.g., page by page) and provide no automated means for extracting pertinent information from the text in a usable form.
There are (at least) two ways that natural language learning
techniques can be used to improve a user's ability to find and extract
information from on-line text. First, we can combine our machine
learning approach to natural language understanding with traditional
statistical approaches to IR to improve the precision of
state-of-the-art IR systems. An IR system locates a relevant text by
measuring the degree of vocabulary overlap between the user's
information request (i.e., the query) and each document in the
collection. In theory, a linguistic analysis of the query and
documents should be able to provide additional constraints on a
high-precision search --- constraints that would be unavailable to a
purely statistical text analyzer. It is one of our goals to use the
natural language learning techniques developed in Kenmore to create a
A second direction of research is to develop