next up previous contents index
Next: 13.9 Speech Communication Quality Up: 13 Evaluation Previous: 13.7 Speech Synthesis Evaluation

13.8 Usability and Interface Design

Sharon Oviatt
Oregon Graduate Institute of Science & Technology, Portland, Oregon, USA

To date, the development of spoken language systems primarily has been a technology-driven phenomenon. As speech recognition has improved, progress traditionally has been documented in the reduction of word error rates [PFF94]. However, reporting word error rate fails to express the frustration typically experienced by users who cannot complete a task with current speech technology [RW93]. Although the successful design of interfaces is essential to supporting usable spoken language systems, research on human-computer spoken interaction currently represents a gap in our scientific knowledge. Moreover, this gap is widely recognized as having generated a bottleneck in our ability to deploy robust speech technology in actual field settings.

Among other challenges, interfaces will be needed that can guide users' spontaneous speech to coincide with system capabilities, since spontaneous speech is known to be particularly variable along a number of linguistic dimensions [CHA95]. Interface techniques for successfully constraining spoken input have been studied most extensively by the telecommunications industry as it strives to automate operator services [KD91,Spi91]. Such work has emphasized the need for realistic and situated user testing, often in field settings, and has shown that dramatic variation can occur in the successful elicitation of target speech depending on the type of system prompt.

Other research has demonstrated that the principle of linguistic convergence, or the tendency of people's speech patterns to gravitate toward those of their interactive partner, can be employed to guide wordiness, lexical choice, and grammatical structure during human-computer spoken interactions, and without imposing any explicit constraints on user behavior [ZF91]. In addition, research has shown that difficult sources of variability in human speech (e.g., disfluencies, syntactic ambiguity) can be reduced by a factor of 2-to-8 fold through alteration of interface parameters [Ovi95,OCW94]. Such work demonstrates the potential impact that interface design can have on managing spoken input, although interface techniques have been underexploited for this purpose. In all of these areas, research typically has involved proactive performance assessment using simulation techniques, which is the preferred method of conducting evaluations of systems in the planning stages.

13.8.1 Future Directions

Many basic issues need to be addressed before technology can leverage fully from the natural advantages of speech---including the speed, ease, spontaneity, and expressive power that people experience when using it during human-human communication. For example, research is needed to evaluate different types of natural spoken dialogue, spontaneous speech characteristics and their management, and dimensions of human-computer interactivity that influence spoken communication. With respect to the latter, research is especially needed on optimal delivery of system confirmation feedback, error patterns and their resolution, flexible regulation of conversational control, and management of users' inflated expectations of the interactional coverage of spoken language systems. In addition, the functional role that ultimately is most suitable for speech technology needs to be evaluated further. Finally, assessment is needed of the potential usability advantages of multimodal systems incorporating speech over unimodal speech systems, with respect to breadth of utility, ease of error handling, learnability, flexibility, and overall robustness [CO94,CHA95]. To support all of these research agendas, tools will be needed for building and adapting high quality, semiautomatic simulations. Such an infrastructure can be used to evaluate the critical performance tradeoffs that designers will encounter as they strive to design more usable spoken language systems.



next up previous contents
Next: 13.9 Speech Communication Quality Up: 13 Evaluation Previous: 13.7 Speech Synthesis Evaluation