Dept. of Linguistics, UC Berkeley and
International Computer Science Institute
The end product of the FrameNet Project (NSF IRI-9618838, "Tools for Lexicon Building", March 1997-February 2000) is a database consisting of (1) a list of the semantic frames that are necessary to describe the meanings of the words in the target domains of health care, chance, perception, communication, transaction, time, space, body, motion, stages of life, social context, emotion, and cognition, and (2) a combination dictionary and thesaurus which will give the frame-semantic, combinatory, and probabilistic descriptions for 5,000 lexical units representative of these domains.
These descriptions will match lexemes with semantic frames, indicate which elements of a given frame are involved in each lexical meaning, and specify how these elements are realized syntactically in sentences containing the lexical units. For example, in the health care domain, frame descriptions will refer to such elements as Healer, Patient, Treatment, Disorder, Body Part, Wound, and Medicine; and the word list will include the verbs cure, heal, treat, recover, etc.
Each word that is polysemous or that has valence alternatives will also be provided with relative frequency data on the occurrence, in our corpora, of the usage in question. Furthermore, each use will be illustrated with ten examples, fully annotated with respect to the grammatical realization of the relevant frame elements. The work is described as "frame-based" since the semantic underpinnings make use of Frame Semantic formalisms, and "corpus-based", since the evidence will be drawn from a large corpus of English text.
First, we are developing a suite of corpus management tools for searching, assembling, and processing concordance lines from large-scale English-language corpora, and annotating these lines to display the ways in which the frame-semantic requirements of individual lexical items are realized in them.
Second, using these tools, we have begun to build the lexical database, refining frame-semantic descriptions of the major frames within each of these domains, and preparing lexical descriptions that can enhance existing natural language processing resources. The lexical databases we produce, together with the corpus annotation tools developed in the process, will be freely distributed for use by the research community.
Within each frame, there are four main phases to the Project's work:
Fillmore, Charles J. 1975. An alternative to checklist theories of meaning. In Papers from the First Annual Meeting of the Berkeley Linguistics Society. 123-132.
Fillmore, Charles J. 1976. Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, Vol. 280.20-32.
Fillmore, Charles J. 1977a. Scenes-and-frames semantics. In Antonio Zampolli (ed.), Linguistics Structures Processing (Fundamental Studies in Computer Science, 59). North-Holland Publishing. 55-82.
Fillmore, Charles J. 1977b. The need for a frame semantics in linguistics. In Hans Karlgren (ed.), Statistical Methods in Linguistics. Scriptor. 5-29.
Fillmore, Charles J. 1982. Frame Semantics. In Linguistics in the morning calm. Hanshin Publishing Co., Seoul, South Korea. 111-137.
Fillmore, Charles J. 1985. Frames and the semantics of understanding. Quaderni di Semantica. 6.2.222-254.
Fillmore, Charles J. 1986. U-Semantics, second round. In Quaderni di Semantica. 8.49-58
Fillmore, Charles J. 1992. ``Corpus linguistics'' vs. ``Computer-aided armchair linguistics''. Directions in Corpus Linguistics. Proceedings from a 1991 Nobel Symposium on Corpus Linguistics, Mouton de Gruyter, Stockholm. 35-66.
Fillmore, Charles J., & B.T.S. Atkins. 1992. Towards a Frame-based Lexicon: the Semantics of RISK and its Neighbors. In A. Lehrer and E. F. Kittay (eds.), Frames, Fields and Contrasts. Lawrence Erlbaum Associates: Hillsdale, NJ. 75-102.
Fillmore, Charles J., & B.T.S. Atkins. 1994. Starting where the dictionaries stop: the challenge for computational lexicography. In B.T.S. Atkins and A. Zampolli (eds.), Computational Approaches to the Lexicon.
J.B. Lowe, C.F. Baker, and C.J. Fillmore. 1997. A frame-semantic approach to semantic annotation. in Proceedings of the SIGLEX workshop "Tagging Text with Lexical Semantics: Why, What, and How?" held April 4-5, in Washington, D.C., USA in conjunction with ANLP-97.
Jurafsky, Daniel. 1996. A Probabilistic Model of Lexical and Syntactic Access and Disambiguation. Cognitive Science 20, 137-194.
Researchers and developers in natural language processing (NLP) - including information retrieval, machine translation, speech recognition, and artificial intelligence - need large-scale lexical information sources. This is still an open research area because it is still not possible today to consult any source which gives us, for an ordinary English word, anything close to the full range of its semantic or combinatorial properties, or an adequate description that is usable across projects and across grammatical theories. No published dictionary contains anything more than a fraction of the information needed for an adequate lexicon for NLP systems. None of them identifies systematic correspondences between the meaning of an item and the syntactic patterns of the phrases and sentences that are built up around it. None of them gives any kind of probabilities for the different kinds of syntactic and semantic arguments that can occur with an item, or the probabilities of different senses of ambiguous words.
FrameNet is one of a number of projects designed to supply such lexical information. Each project focuses on different sources of lexical information: pronunciation, morphology, part of speech, syntax, semantics, frequency. One way to build such a database is to make a machine-readable version of any existing dictionary, a machine-readable dictionary (MRD). Other lexicon projects include the COMLEX lexicon (Wolff, Macleod, and Meyers 1993, Macleod, Grishman, and Meyers 1994), (also see their project summary), the CELEX database (Piepenbrock 1993, Baalen 1991, Burnage 1990, Schreuder and Kerkman 1987), and WordNet (see Miller et al. 1990).
Boguraev, Branimir, & James Pustejovsky (eds.). 1993. Acquisition of Lexical Knowledge from text. Proceedings of a Workshop Sponsored by the Special Interest Group on the Lexicon of the Association for Computational Linguistics, 21 June, 1993, Ohio State University, Columbus, Ohio. Published by the ACL.
Macleod, Catherine, Ralph Grishman, & A. Meyers. 1994. The COMLEX Project: the first year. In Proceedings of the Human Language Technology Workshop. San Francisco: Morgan Kaufmann. 8-12.
Miller, George A., R. Beckwith, C. Fellbaum, D. Gross, & K.J. Miller. 1990. Introduction to WordNet: an on-line lexical database. In George A. Miller (guest ed.), International Journal of Lexicography, 3:4, Winter 1990. Oxford University Press, Oxford.
Resnik, Philip S. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. Unpublished Ph.D. dissertation, University of Pennsylvania.
The project is closely related to ongoing work on WordNet, and to the now completed EC-funded DELIS project which alpha-tested the theory, methodology, and technology on which our project is based. The work we propose is highly compatible with a number of current research initiatives, among them the CSLI / Universität Saarbrücken HPSG Project, whose findings will be enriched by our semantically rich database. There is an effort at integrating the HPSG and Construction Grammar formalisms, now facilitated by the presence of Andreas Kathol, who is associated with both ICSI and the Berkeley Linguistics Department, and the work of the present project will feed into the development of the lexical component of the resulting framework; the EC-funded PAROLE project (Calzolari & Spanu 1996), which is engaged in the building of national corpora; and several ongoing projects at ICSI, including: