Postscript Version

Speech Disfluencies in Spoken Language Systems: A Dialog-Centered Approach

Susan E. Brennan and Michael F. Schober

State University of New York at Stony Brook and The New School for Social Research

CONTACT INFORMATION

Susan E. Brennan
Department of Psychology
SUNY at Stony Brook
Stony Brook, NY 11794-2500
Phone: (516) 632-9145
Fax : (516) 632-7876
Email: susan.brennan@sunysb.edu

WWW PAGE

http://www.psy.sunysb.edu/sbrennan/

PROGRAM AREA

Speech and Natural Language Understanding

KEYWORDS

Speech, dialog, miscommunication, repair, production and comprehension of disfluencies

PROJECT SUMMARY

We are testing two main hypotheses about human speech disfluencies (such as um, uh, restarts, and hesitations). The first hypothesis is that disfluencies, which today's spoken language systems either ignore or attempt to edit out, may actually be exploitable resources. That is, they may provide information (to both human listeners and spoken language systems) about speakers' planning difficulties and metacognitive states. Our first set of psychological studies examines the information present in disfluencies, how humans deal with disfluencies they hear in conversation, and the extent to which speech disfluencies help or hinder comprehension.

The second main hypothesis is that a dialog-centered approach to disfluencies should improve a spoken language system's ability to deal with them. The proposal is for systems to use multimodal feedback strategies adapted from conversation to repair misinterpretations arising from disfluencies. Our second set of studies will examine how speakers can use various kinds of feedback from a spoken language system to collaboratively repair problems arising from disfluencies, as well as whether some feedback strategies lead to further disfluencies. One goal is to understand the kind of feedback that is most helpful in interacting with a spoken language system. Another goal is to work toward an evaluation metric for spoken language systems that is truly dialogic: it isn't simply a system's faults that should count for evaluation, but the system's ability to recover from those faults. Our collaborator on this phase of the project is Eric Hulteen of Apple Computer, Inc.

Finally, we are examining spontaneous speech corpora to determine factors affecting the rates and types of disfluencies (including speaker's age, gender, topic being discussed, and prior experience with an addressee).

This research will contribute to the development of psycholinguistic theory about processing and comprehension of disfluent utterances. It will give a broader idea of the range and rates of disfluencies that occur in human-human and human- computer dialog. It also contributes to import ant practical goals concerning spoken language systems, such as how to be st cope with disfluent speech, how to predict when speech will be disfluent, and how to make speech recognition technology more usable through the design of optimal feedback.

PROJECT REFERENCES

Brennan, S. E. & Hulteen, E. (1995). Interaction and feedback in a spoken language system: A theoretical framework. Knowledge-Based Systems, 8, 143-151.

Brennan, S. E. & Williams, M. (1995). The feeling of another's knowing: Prosody and filled pauses as cues to listeners about the metacognitive states of speakers. Journal of Memory and Language, 34, 383-398.

Brennan, S. E. & Kipp, E. G. (1996). An addressee's knowledge affects a speaker's use of fillers in question-answering. Abstracts of the Psychonomic Society, 37th Annual Meeting,Chicago, IL (p. 24).

Brennan, S. E. (1996). Lexical entrainment in spontaneous dialog. Proceedings, 1996 International Symposium on Spoken Dialogue (ISSD-96). Philadelphia, PA, pp. 41-44.

Brennan, S. E. (in press, 1997). The grounding problem in conversation with and through computers. In S. R. Fussell & R. J. Kreuz (Eds.), Social and cognitive psychological approaches to interpersonal communication. Mahwah, NJ: Lawrence Erlbaum.

Brennan, S. E. & Schober, M. F. (submitted, 1997). When do speech disfluencies help comprehension? Abstracts of the Psychonomic Society, 38th Annual Meeting.

Brennan, S. E. (To appear). The vocabulary problem in spoken dialog systems. In S. Luperfoy (Ed.), Automated spoken dialog systems, Cambridge, MA: MIT Press.

Brennan, S. E. & Cahn, J. E. (in preparation). Modeling the progression of mutual understanding in dialog.

Bortfeld, H., Leon, S., Bloom, J. E., Schober, M. F., & Brennan,. S. E. (in preparation). Conversational speech disfluencies as a function of speaker's age, sex, task domain, role, and relationship with addressee. Manuscript in preparation.

AREA BACKGROUND

More broadly, we study the psychology of language use, with the goal of understanding psycholinguistic phenomena in communicative contexts. Interactive partners in these contexts may be either humans or machines. They may be communicating using text, speech, gesture, or graphics; they may or may not be copresent in time and space.

Traditional psycholinguistic paradigms consider production or comprehension in isolation and examine processing on one "level" at a time. However, in conversation, the primary setting for language use, people act as both speakers and addressees, processing information on many levels at once. We try to examine phenomena such as referring, lexical choice, prosody, articulation, and understanding in natural and spontaneous (but still controlled) discourse contexts. Rather than testing individuals on their language production in the absence of goals and addressees or on their comprehension of idealized sentences, many of our experiments examine pairs of people interacting or interpreting utterances drawn from spontaneous conversations, as well as people interacting with real or simulated systems. This approach is labor intensive, especially when it requires inventing new tasks and measurement techniques. However, we think it is important to bridge low level and high level psycholinguistic processing, since language use is a complex and contextualized activity with both cognitive and social components.

Any form of interaction, including dialog, necessitates that partners ground their actions. Grounding is a process by which partners seek and provide evidence of how actions and utterances have been understood. This process underlies not only the coordination of actions with an interactive partner, but also the repair of misunderstandings. In human-computer interaction, just as in human conversation, neither partner is omniscient; the most successful computer interfaces (such as some direct manipulation interfaces) are those that enable people and systems to ground easily. A long-term goal is to improve the grounding process with language and speech interfaces.

The implications and applications of our research span Linguistics and Computer Science as well as our home discipline, Psychology.

AREA REFERENCES

Colleagues in the Interactive Systems community and in industry whose work is relevant to ours include (but are not limited to!) James Allen, Susan Boice, Ron Cole, Peter Heeman, Graeme Hirst, Susann Luperfoy, Susan McRoy, David Novick, Sharon Oviatt, Patti Price, Chris Schmandt, Elizabeth Shriberg, Ronnie Smith, David Traum, Lyn Walker, and Steve Whittaker, and their collaborators. Rather than refer to all of them here, we include only a few references from the field of psycholinguistics:

Brennan, S. E. & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition, 6, 1482-1493.

Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Ed.), Perspectives on socially shared cognition (pp. 127-149). Washington, D.C.: APA. Reprinted in R. M. Baecker (Ed.), Groupware and computer-supported cooperative work: Assisting human-human collaboration. San Mateo, CA: Morgan Kaufman Publishers, Inc.

Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259-294.

Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1-39.

Fox Tree, J. E., & Clark, H. H. (in press). Pronouncing "the" as "thee" to signal problems in speaking. Cognition.

Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.

Schober, M. F., & Clark, H. H. (1989). Understanding by addressees and overhearers. Cognitive Psychology, 21, 211-232.

RELATED PROGRAM AREAS

(3) Other Communication Modalities, (4) Adaptive Human Interfaces

POTENTIAL RELATED PROJECTS

We are very much open to collaboration, and look forward to discussing new project ideas at the workshop.