Department of Computer Science
and
Center for Language and Speech Processing
Johns Hopkins University
E. Brill, Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Some advances in rule-based part of speech tagging, 1994, Seattle, Wa.
E. Brill and P. Resnik, A transformation-based approach to prepositional phrase attachment disambiguation, Proceedings of the Fifteenth International Conference on Computational Linguistics (COLING-1994),Kyoto, Japan, 1994
E. Brill, Computational Linguistics,Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, 1995
E. Brill, Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging, Proceedings of the third ACL Workshop on Very Large Corpora, 1995
G. Satta and E. Brill, Efficient Transformation-Based Parsing. Proceedings of the Association for Computational Linguistics, 1996.
E. Brill. Learning to Parse with Transformations. In: Recent Advances in Parsing Technology, Kluwer, 1996.
L. Mangu and E. Brill, Automatic Rule Acquisition for Spelling Correction. Proceedings of the International Conference of Machine Learning, 1997.
G. Satta and J. Henderson, String Transformation Learning. Proceedings of the Association for Computational Linguistics, 1997.
In order for a program to accurately annotate a sentence, it must be provided with a great deal of information about language. Until recently, such knowledge was typically hand-coded by language engineers, a time-consuming process which rarely resulted in accurate, robust systems. The linguistic knowledge acquisition bottleneck has made it difficult to create accurate linguistic annotation programs. This inability to accurately analyze the linguistic structure of a sentence has hindered the development of sophisticated natural language processing systems.
Over the past few years, there has been a major shift from trying to manually derive linguistic information to extracting this information automatically from on-line resources such as corpora, dictionaries and encyclopedias. In addition, a number of text corpora have been carefully annotated with linguistic information. These corpora are also valuable resources for automatic knowledge acquisition. Programs employing machine learning techniques to automatically learn linguistic information are becoming more reliable all the time, as more sophisticated techniques are being developed and larger training corpora are made available. It is our hope that with the development of accurate and portable linguistic annotation algorithms, it will be possible to create highly sophisticated natural language processing systems in the near future.
K. Church and R. Mercer, Computational Linguistics, Introduction to the Special Issue on Computational Linguistics Using Large Corpora, 1993
E. Charniak. Statistical Language Learning. 1993. MIT Press.