Postscript Version

Modeling Disfluencies in Spontaneous Speech

Elizabeth Shriberg¹, Herbert Clark², Stefanie Shattuck-Hufnagel³, Patti Price¹

¹ Speech Technology and Research Laboratory, SRI International
² Department of Psychology, Stanford University
³ Research Laboratory of Electronics, Massachusetts Institute of Technology

CONTACT INFORMATION

Elizabeth Shriberg
SRI International
333 Ravenswood Avenue
Menlo Park, CA 94025
Phone: (415) 859-3798
Fax : (415) 859-5984
Email: ees@speech.sri.com

WWW PAGE

http://www.speech.sri.com/projects/disfluencies.html

PROGRAM AREA

Speech and Natural Language Understanding

KEYWORDS

disfluency, repair, psycholinguistics, speech recognition, natural language understanding, prosody

PROJECT SUMMARY

Disfluencies (e.g., "uh", "um", repeated words, self-repairs) are prevalent in the spontaneous utterances of normal speakers. The modeling of disfluencies, however, is currently quite limited. This project models disfluencies at lexical, syntactic, and acoustic-prosodic levels. The goal is to gain insight into human communication, and to develop algorithms to robustly recognize speech that includes disfluencies.

The approach involves analysis of disfluencies in existing, digitized corpora and in speech collected in controlled experiments. The investigation is undertaken by a team representing expertise in different, complementary disciplines, including linguistics, psycholinguistics, and cognitive psychology. As the project enters its final phase, recent efforts at SRI have investigated how results of the descriptive research can be integrated in SRI's speech understanding system. In particular SRI has developed methods for automatically detecting disfluencies, using acoustic-prosodic information combined with specialized language models. Related studies by Co-Investigator Herbert Clark (Stanford) have focused on syntactic properties of disfluencies and on functional aspects. Additional related work by Stefanie Shattuck-Hufnagel (MIT) aims to understand the articulatory mechanisms involved in self-interruption, as well as the relationship between speech errors and sentence prosody.

PROJECT REFERENCES

H. H. Clark (1996). Using Language. Cambridge: Cambridge University Press.

H. H. Clark & J. E. Fox Tree (1997). Pronouncing "the" as "thee" to signal problems in speaking. Cognition, 62, 151-167.

E. E. Shriberg, R. Bates, and A. Stolcke (1997). A prosody-only decision-tree model for disfluency detection. To appear in Proc. EUROSPEECH, Rhodes, Greece.

E. E. Shriberg & A. Stolcke (1996). Word predictability after filled pauses: A corpus-based study. Proc. Intl. Conf. on Spoken Language Processing, 1868-1871, Philadelphia, PA.

E. E. Shriberg (1994). Preliminaries to a theory of speech disfluencies. PhD thesis, University of California at Berkeley,

S. Shattuck-Hufnagel (in press). Phrase-level phonology in speech production planning: evidence for the role of prosodic structure. In A Festschrift for Gösta Bruce, Ed.  Merle Horne.

A. Stolcke & E. E. Shriberg (1996). Statistical language modeling for speech disfluencies. Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, 405-409, Atlanta, GA.

AREA BACKGROUND

Spoken language is the medium used first and foremost by humans for accurate and efficient interactive problem solving. As an input modality for human-computer interaction, spoken language can offer: (1) accessibility to an increasing number of people, including those with little or no training, (2) increased access to a growing set of data resources via telephone without a computer terminal, (3) increased power for those already familiar with computer technology, (4) an additional communication channel for more robust communication, for use in unusual environments, and for devices for the disabled, (5) flexibility of modality and use of computers by humans generally, and (6) increased applications and job opportunities in areas that will grow out of increased exposure of people to the potential of technology.

Although there has been significant work devoted to some spontaneous speech phenomena, such as "slips of the tongue," other much more frequent types of spontaneous speech "disfluencies" have been largely ignored, e.g., false starts, hesitations, filled pauses and related phenomena. Such disfluencies are highly prevalent in normal human communication. Although disfluencies are less frequent in human-machine dialog, the causes and costs (e.g., in terms of cognitive load on the user) of this discrepancy are unknown. Further, because current speech understanding systems do not model disfluencies well, when they do occur, they are correlated with speech recognition and understanding errors. As spoken language systems evolve to allow more natural human-machine dialogue, the rate of disfluencies is likely to rise to rates closer to those observed in human-machine communication. A better understanding of disfluencies is critical to the development of a principled treatment of these highly frequent events in spontaneous speech.

AREA REFERENCES

P. A. Heeman and J. Allen (1994). Detecting and correcting speech repairs. Proc. 32th Annual Meeting of the Association for Computational Linguistics, 295-302.

W. J. M. Levelt (1989). Speaking: From Intention to Articulation. Cambridge, Mass.: MIT Press.

C. H. Nakatani and J. Hirschberg (1994). A corpus-based study of repair cues in spontaneous speech. Journal of the Acoustical Society of America, 95(3), 1603-1616.

D. O'Shaughnessy (1994). Correcting complex false starts in spontaneous speech. Proc. ICASSP-94, Vol. I, 349-352.

S. L. Oviatt (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9, 19-35.

P. Price (1996), Spoken Language Understanding, in R. A. Cole (ed.), Survey of the State of the Art in Human Language Technology, Center for Spoken Language Understanding, Oregon Graduate Institute.

RELATED PROGRAM AREAS

Adaptive Human Interfaces, Usability and User-Centered Design, Intelligent Interactive Systems for Persons with Disabilities, Other Communication Modalities.

POTENTIAL RELATED PROJECTS

The project is related to a number of other efforts currently funded by NSF concerning the analysis and modeling of spontaneous speech, including: human sentence production, speech repair and dialog repair in spoken language systems, automatic recognition and natural language processing for conversational speech, and methods for manual and automatic disfluency annotation in large databases.