Postscript Version

TOWARD A THEORY OF METACONTROL FOR DIALOGUE SYSTEMS

Alan W. Biermann

Department of Computer Science
Duke University

CONTACT INFORMATION

Department of Computer Science
Duke University
Box 90129
Durham, NC 27708-0129
Phone: (919) 660-6500
Fax: (919) 660-6519
Email: awb@cs.duke.edu

WWW PAGE

http://www.cs.duke.edu/cgi-bin/facinfo?awb

PROGRAM AREA

Speech and Natural Language Understanding.

KEYWORDS

Voice interactive systems, human-machine collaboration, multimedia systems, user modeling, dialogue theory, human factors

PROJECT SUMMARY

When humans collaborate with each other, they undertake a variety of behaviors that enable fast and efficient convergence to the goal. Each participant is continuously involved in mental problem solving and when one or the other sees a solution, he or she will announce it. Frequently, however, there will be obstacles to the solution and attention will focus on these. What are the critical paths to success and what must be done to overcome the obstacles? The participants will open dialogue on these difficulties and bring to bear resources towards their solution.

This project seeks to embed in the machine the facilities to enable it to cooperate with a human in the same way. Specifically, knowledge for problem solving is coded in the machine with Prolog-style rules. The rules are used to attempt to prove that the goal is solved. If the proof is successful, then very little dialogue will follow. But if the proof is not successful, then the system will look for key "missing axioms" in the proof and initiate dialogue to attempt to find the needed information. The result is an aggressive interaction that addresses one problem and then another in sequence as the roadblocks to success are discovered. The dialogue jumps from one issue to the next, occasionally giving up on one, returning to a previous topic, opening a different one, and so forth until a set of steps are found that achieves success. Within this context, a series of issues arises and some of them are listed here:

(a) User Modeling: When the system is prepared to seek new information, it needs a measure of what the user is able to do and what is an appropriate level for posing the question. For example, the user may be very naive and be able to respond only to the most elementary requests, or the user may be expert and be able to respond easily to high level questions. In the context of the current dialogue system, the information known to the user (the user model) is coded in the same Prolog-style rules so that the query mechanism can properly adapt to the user without any additional mechanisms. The system asks only questions that are answerable by the user and at the level appropriate to the specific user as indicated by the user modeling information.
(b) Variable Initiative: Efficient interaction depends on giving the initiative to the appropriate participant at each instant of time. The participant who knows most about how to solve some particular subgoal is usually the best one to lead the interaction, and the mechanism to pass the initiative back and forth can be built into the dialogue system. Specifically, when the next subgoal on the proof tree is to be selected, the participant with the initiative is allowed to make that choice.
(c) Multimedia Interaction: The computations are in terms of the logical notation of the Prolog system. When communication is to be done either to receive information or to send it back to the user, it can be coded in any form: typed or spoken natural language, displayed graphics, or some combination of input-output forms. The translation can be done by multimedia grammars.
(d) User Adaptive Communication: The use of multimedia grammars leads to a myriad of questions concerning what is the best representation of a particular idea. A presentation could use spoken output, a displayed text message, a graphic image on the screen with associated voiced comment, and many other forms. Our project has developed a way to enable the user to punish the system for undesirable output forms and train the system to adapt its output to fit his or her momentary preferences.
(e) Dialogue Robustness: A major current emphasis is the adaptation of the dialogue machinery to achieve very high levels of task robustness. Thus the system may fail on individual interactions but the theorem proving mechanisms can be styled to embed very detailed background information. This enables dialogue with a variety of approaches to problem solution and extensive capability to give support discussion leading to some particular difficult goal. These capabilities lead to tenacious dialogue that tirelessly seeks success through variety and repetition until it is achieved.

Implemented Systems

Our project has implemented several speech-interactive dialogue systems to test ideas and to gain experience with them. Two examples are our Circuit-Fixit-Shoppe and Programming Tutor systems. The Circuit-Fixit-Shoppe was completed in 1991 and tested as described in the references listed below. This system demonstrated many of the characteristics described above and was successfully used by human subjects in 141 problem solving sessions to find bugs in and repair electric circuits. The Programming Tutor is currently operative, has a much cleaner and simpler design, and has full graphics and typed text communication as well as speech for a full multimedia capability.

Both systems have been tested extensively with human subjects with high success rates. Success in problem solving was in the 80 percent-plus range, speaking rates were as high as several sentences per minute, sentence recognition rates were in the 80s, and user subjective responses were very positive.

PROJECT REFERENCES

Ronnie W. Smith and D. Richard Hipp, Spoken Natural Language Dialog Systems, Oxford University Press, New York, 1994

Ronnie W. Smith, D. Richard Hipp, and Alan W. Biermann, "An Architecture for Voice Dialogue Systems Based on Prolog-Style Theorem Proving," Computational Linguistics, Vol. 21, No. 3, September, 1995.

Curry I. Guinn, "Mechanisms for Mixed Initiative Human-Computer Collaborative Discourse," 34th Annual Meeting of the ACL, Santa Cruz, June 24-27, 1996.

Curry I. Guinn, "Dialogue Mechanisms for Conflict Resolution in Natural Language Discourse," 1996 Symposium On Human Interaction With Complex Systems," Dayton, Ohio, August 26-28, 1996.

Alan W. Biermann and Philip M. Long, "The Composition of Messages in Speech-Graphics Interactive Systems," International Symposium on Spoken Dialogue, Philadelphia, Penn., October 2-3, 1996.

Alan W. Biermann, Curry I. Guinn, Michael S. Fulkerson, Gregory Keim, Zheng Liang, Douglas M. Melamed, Krishnan Rajagopalan, "Goal-Oriented Multimedia Dialogue with Variable Initiative," to be presented at the International Symposium on Methodologies for Intelligent Systems-1997, Charlotte, North Carolina, October 15-18, 1997.

AREA BACKGROUND

This general area is extremely broad and includes fields related to every stage of processing: speech recognition, parsing theory, semantics theory, representation of knowledge, collaborative theory, dialogue theory, natural language generation, speech generation, multimedia communication, user modeling, and much more.

AREA REFERENCES

Computational Linguistics, the journal.

James Allen, Natural Language Understanding, Second Edition, Benjamin/Cummings Publishing Company, Inc., 1994.

RELATED PROGRAM AREAS

Virtual Environments
Other Communicative Modalities
Adaptive Human Interfaces
Usability and User-Centered Design
Intelligent Interactive Systems for Persons with Disabilities

POTENTIAL RELATED PROJECTS

There are long lists of related projects including the following: the human factors of voice interactive problem-solving systems, learning for optimization of dialogue performance, strategies for tutoring and their automation, studies of multimedia systems.