Postscript Version

FrameNet

Charles J. Fillmore

Dept. of Linguistics, UC Berkeley and
International Computer Science Institute

CONTACT INFORMATION

1947 Center St. Suite 600
Berkeley, CA, 94704
Phone: (510) 642-4274 ext. 314
Fax : (510) 643-7684
Email: fillmore@icsi.berkeley.edu

WWW PAGE

http://www.icsi.berkeley.edu/~framenet

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

semantically annotated lexicon, frame semantics, semantic representation, argument structure, corpus linguistics

PROJECT SUMMARY

A significant roadblock to the development of practical natural language processing applications is the lack of large-scale lexical resources with the right kind and amount of information. Such resources must cover the breadth of a language's basic vocabulary, but also provide appropriate syntactic, semantic and statistical information about individual lexical items. The FrameNet project is an attempt to create this next generation of lexical database.

The end product of the FrameNet Project (NSF IRI-9618838, "Tools for Lexicon Building", March 1997-February 2000) is a database consisting of (1) a list of the semantic frames that are necessary to describe the meanings of the words in the target domains of health care, chance, perception, communication, transaction, time, space, body, motion, stages of life, social context, emotion, and cognition, and (2) a combination dictionary and thesaurus which will give the frame-semantic, combinatory, and probabilistic descriptions for 5,000 lexical units representative of these domains.

These descriptions will match lexemes with semantic frames, indicate which elements of a given frame are involved in each lexical meaning, and specify how these elements are realized syntactically in sentences containing the lexical units. For example, in the health care domain, frame descriptions will refer to such elements as Healer, Patient, Treatment, Disorder, Body Part, Wound, and Medicine; and the word list will include the verbs cure, heal, treat, recover, etc.

Each word that is polysemous or that has valence alternatives will also be provided with relative frequency data on the occurrence, in our corpora, of the usage in question. Furthermore, each use will be illustrated with ten examples, fully annotated with respect to the grammatical realization of the relevant frame elements. The work is described as "frame-based" since the semantic underpinnings make use of Frame Semantic formalisms, and "corpus-based", since the evidence will be drawn from a large corpus of English text.

First, we are developing a suite of corpus management tools for searching, assembling, and processing concordance lines from large-scale English-language corpora, and annotating these lines to display the ways in which the frame-semantic requirements of individual lexical items are realized in them.

Second, using these tools, we have begun to build the lexical database, refining frame-semantic descriptions of the major frames within each of these domains, and preparing lexical descriptions that can enhance existing natural language processing resources. The lexical databases we produce, together with the corpus annotation tools developed in the process, will be freely distributed for use by the research community.

Within each frame, there are four main phases to the Project's work:

  1. Phase One: Initial Description of The Frame

    The "armchair" phase, in which the lexicographer-semanticist tries to figure out how a particular frame is structured, names the frame elements, and chooses a list of words whose meanings are based on the frame.

  2. Phase Two: The Assembly Line

    An "assembly line" process in which corpus examples are extracted for individual words, and sentences containing them are given syntactic and semantic annotations.

  3. Phase Three: Preparation of the Entry

    Preparation of the database entry by the lexicographer-semanticist, summarizing the information that has been prepared in the middle phase.

  4. Phase Four: Addition of Relative Frequency Data

    The addition of frequency information on each use and valence alternative based on uses of the lemma in the BNC as a whole (done by subcontractor Daniel Jurafsky at University of Colorado).
(The work plan given above is mainly due to our two outside consultants, Dr. Ulrich Heid, planning adviser, from the University of Stuttgart, and Sue Atkins, lexicographic adviser.)

PRESENT STATE

At present (1) we have partial word lists for a number of domains; (2) we have adequate frame descriptions for perception and sensation, communication, commercial transactions, and partial frame descriptions for space, time, motion, and health care; (3) the British National Corpus is installed and working, with the Stuttgart corpus management tools Xkwic, CQP, and a Macro-Processor based on CQP; (4) initial drafts of operator manuals for syntactic tagging are nearing completion; (5) we have a large and growing bibliography of semantic topics in our areas, accessible through our web page; (6) workers are acquiring necessary skills in Unix, Emacs, Xkwic, and LaTeX, as well as training in grammatical analysis and frame-semantic description; (7) the syntactic pattern filtering commands, using CQP on corpus material, are nearing completion; and (8) the sorting and tagging tools and the annotation tools, operating on text files, are being prepared. We expect the annotation process to be in full operation by October, 1997.

PROJECT REFERENCES

Fillmore, Charles J. 1971. Verbs of judging: an exercise in semantic description. In Fillmore and Langendoen (eds.) Studies in Linguistic Semantics. New York: Holt, Rinehart & Winston. 272-289.

Fillmore, Charles J. 1975.  An alternative to checklist theories of meaning. In Papers from the First Annual Meeting of the Berkeley Linguistics Society. 123-132.

Fillmore, Charles J. 1976. Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conference on the Origin and Development of Language and Speech, Vol. 280.20-32.

Fillmore, Charles J. 1977a. Scenes-and-frames semantics. In Antonio Zampolli (ed.), Linguistics Structures Processing (Fundamental Studies in Computer Science, 59). North-Holland Publishing. 55-82.

Fillmore, Charles J. 1977b. The need for a frame semantics in linguistics. In Hans Karlgren (ed.), Statistical Methods in Linguistics. Scriptor. 5-29.

Fillmore, Charles J. 1982. Frame Semantics. In Linguistics in the morning calm. Hanshin Publishing Co., Seoul, South Korea. 111-137.

Fillmore, Charles J. 1985. Frames and the semantics of understanding. Quaderni di Semantica. 6.2.222-254.

Fillmore, Charles J. 1986. U-Semantics, second round. In Quaderni di Semantica. 8.49-58

Fillmore, Charles J. 1992. ``Corpus linguistics'' vs. ``Computer-aided armchair linguistics''. Directions in Corpus Linguistics. Proceedings from a 1991 Nobel Symposium on Corpus Linguistics, Mouton de Gruyter, Stockholm. 35-66.

Fillmore, Charles J., & B.T.S. Atkins. 1992. Towards a Frame-based Lexicon: the Semantics of RISK and its Neighbors. In A. Lehrer and E. F. Kittay (eds.), Frames, Fields and Contrasts. Lawrence Erlbaum Associates: Hillsdale, NJ. 75-102.

Fillmore, Charles J., & B.T.S. Atkins. 1994. Starting where the dictionaries stop: the challenge for computational lexicography. In B.T.S. Atkins and A. Zampolli (eds.), Computational Approaches to the Lexicon.

J.B. Lowe, C.F. Baker, and C.J. Fillmore. 1997. A frame-semantic approach to semantic annotation. in Proceedings of the SIGLEX workshop "Tagging Text with Lexical Semantics: Why, What, and How?" held April 4-5, in Washington, D.C., USA in conjunction with ANLP-97.

Jurafsky, Daniel. 1996. A Probabilistic Model of Lexical and Syntactic Access and Disambiguation. Cognitive Science 20, 137-194.

AREA BACKGROUND

Researchers and developers in natural language processing (NLP) - including information retrieval, machine translation, speech recognition, and artificial intelligence - need large-scale lexical information sources. This is still an open research area because it is still not possible today to consult any source which gives us, for an ordinary English word, anything close to the full range of its semantic or combinatorial properties, or an adequate description that is usable across projects and across grammatical theories. No published dictionary contains anything more than a fraction of the information needed for an adequate lexicon for NLP systems. None of them identifies systematic correspondences between the meaning of an item and the syntactic patterns of the phrases and sentences that are built up around it. None of them gives any kind of probabilities for the different kinds of syntactic and semantic arguments that can occur with an item, or the probabilities of different senses of ambiguous words.

FrameNet is one of a number of projects designed to supply such lexical information. Each project focuses on different sources of lexical information: pronunciation, morphology, part of speech, syntax, semantics, frequency. One way to build such a database is to make a machine-readable version of any existing dictionary, a machine-readable dictionary (MRD). Other lexicon projects include the COMLEX lexicon (Wolff, Macleod, and Meyers 1993, Macleod, Grishman, and Meyers 1994), (also see their project summary), the CELEX database (Piepenbrock 1993, Baalen 1991, Burnage 1990, Schreuder and Kerkman 1987), and WordNet (see Miller et al. 1990).

AREA REFERENCES

Atkins, B.T.S., & A. Zampolli, (eds.). 1994. Computational Approaches to the Lexicon. Oxford University Press, Oxford.

Boguraev, Branimir, & James Pustejovsky (eds.). 1993. Acquisition of Lexical Knowledge from text. Proceedings of a Workshop Sponsored by the Special Interest Group on the Lexicon of the Association for Computational Linguistics, 21 June, 1993, Ohio State University, Columbus, Ohio. Published by the ACL.

Macleod, Catherine, Ralph Grishman, & A. Meyers. 1994. The COMLEX Project: the first year. In Proceedings of the Human Language Technology Workshop. San Francisco: Morgan Kaufmann. 8-12.

Miller, George A., R. Beckwith, C. Fellbaum, D. Gross, & K.J. Miller. 1990. Introduction to WordNet: an on-line lexical database. In George A. Miller (guest ed.), International Journal of Lexicography, 3:4, Winter 1990. Oxford University Press, Oxford.

Resnik, Philip S. 1993. Selection and Information: A Class-Based Approach to Lexical Relationships. Unpublished Ph.D. dissertation, University of Pennsylvania.

RELATED PROGRAM AREAS

Adaptive Human Interfaces

POTENTIAL RELATED PROJECTS

The FrameNet database will be an important adjunct to existing NLP tools sets such as WordNet and COMLEX. It will provide additional power to parsers, information retrieval systems and and natural language understanding systems by providing significant semantic and statistical information.

The project is closely related to ongoing work on WordNet, and to the now completed EC-funded DELIS project which alpha-tested the theory, methodology, and technology on which our project is based. The work we propose is highly compatible with a number of current research initiatives, among them the CSLI / Universität Saarbrücken HPSG Project, whose findings will be enriched by our semantically rich database. There is an effort at integrating the HPSG and Construction Grammar formalisms, now facilitated by the presence of Andreas Kathol, who is associated with both ICSI and the Berkeley Linguistics Department, and the work of the present project will feed into the development of the lexical component of the resulting framework; the EC-funded PAROLE project (Calzolari & Spanu 1996), which is engaged in the building of national corpora; and several ongoing projects at ICSI, including: