¹ Speech Technology and Research Laboratory, SRI International
² Department of Psychology, Stanford University
³ Research Laboratory of Electronics, Massachusetts Institute of Technology
The approach involves analysis of disfluencies in existing, digitized corpora and in speech collected in controlled experiments. The investigation is undertaken by a team representing expertise in different, complementary disciplines, including linguistics, psycholinguistics, and cognitive psychology. As the project enters its final phase, recent efforts at SRI have investigated how results of the descriptive research can be integrated in SRI's speech understanding system. In particular SRI has developed methods for automatically detecting disfluencies, using acoustic-prosodic information combined with specialized language models. Related studies by Co-Investigator Herbert Clark (Stanford) have focused on syntactic properties of disfluencies and on functional aspects. Additional related work by Stefanie Shattuck-Hufnagel (MIT) aims to understand the articulatory mechanisms involved in self-interruption, as well as the relationship between speech errors and sentence prosody.
H. H. Clark (1996). Using Language. Cambridge: Cambridge University Press.
H. H. Clark & J. E. Fox Tree (1997). Pronouncing "the" as "thee" to signal problems in speaking. Cognition, 62, 151-167.
E. E. Shriberg, R. Bates, and A. Stolcke (1997). A prosody-only decision-tree model for disfluency detection. To appear in Proc. EUROSPEECH, Rhodes, Greece.
E. E. Shriberg & A. Stolcke (1996). Word predictability after filled pauses: A corpus-based study. Proc. Intl. Conf. on Spoken Language Processing, 1868-1871, Philadelphia, PA.
E. E. Shriberg (1994). Preliminaries to a theory of speech disfluencies. PhD thesis, University of California at Berkeley,
S. Shattuck-Hufnagel (in press). Phrase-level phonology in speech production planning: evidence for the role of prosodic structure. In A Festschrift for Gösta Bruce, Ed. Merle Horne.
A. Stolcke & E. E. Shriberg (1996). Statistical language modeling for speech disfluencies. Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, 405-409, Atlanta, GA.
Although there has been significant work devoted to some spontaneous speech phenomena, such as "slips of the tongue," other much more frequent types of spontaneous speech "disfluencies" have been largely ignored, e.g., false starts, hesitations, filled pauses and related phenomena. Such disfluencies are highly prevalent in normal human communication. Although disfluencies are less frequent in human-machine dialog, the causes and costs (e.g., in terms of cognitive load on the user) of this discrepancy are unknown. Further, because current speech understanding systems do not model disfluencies well, when they do occur, they are correlated with speech recognition and understanding errors. As spoken language systems evolve to allow more natural human-machine dialogue, the rate of disfluencies is likely to rise to rates closer to those observed in human-machine communication. A better understanding of disfluencies is critical to the development of a principled treatment of these highly frequent events in spontaneous speech.
P. A. Heeman and J. Allen (1994). Detecting and correcting speech repairs. Proc. 32th Annual Meeting of the Association for Computational Linguistics, 295-302.
W. J. M. Levelt (1989). Speaking: From Intention to Articulation. Cambridge, Mass.: MIT Press.
C. H. Nakatani and J. Hirschberg (1994). A corpus-based study of repair cues in spontaneous speech. Journal of the Acoustical Society of America, 95(3), 1603-1616.
D. O'Shaughnessy (1994). Correcting complex false starts in spontaneous speech. Proc. ICASSP-94, Vol. I, 349-352.
S. L. Oviatt (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9, 19-35.
P. Price (1996), Spoken Language Understanding, in R. A. Cole (ed.), Survey of the State of the Art in Human Language Technology, Center for Spoken Language Understanding, Oregon Graduate Institute.