next up previous contents
Next: Evaluation of Spoken Up: Spoken Language Understanding Previous: Overview

State of the Art

Evaluation of spoken language understanding  systems (see chapter 13) is required to estimate the state of the art objectively. However, evaluation itself has been one of the challenges of spoken language understanding. A brief survey of spoken language understanding work in the Europe, Japan and the U.S. is surveyed briefly below, and evaluation will be discussed in the following section.

Several sites in Canada, Europe and Japan have been researching spoken language understanding systems, including INRS  in Canada, LIMSI  in France, KTH  in Sweden, the Center for Language Technology  in Denmark, SRI  International and DRA  in the UK, Toshiba  in Japan. The five year ESPRIT  SUNDIAL  project, which concluded in August 1993, involved several sites and the development of prototypes for train timetable queries in German  and Italian  and flight queries in English and  French . All these systems are described in articles in [Eur93]. The special issue of Speech Communication on Spoken Dialogue [SF94], also includes several system descriptions, including those from NTT , MIT , Toshiba , and Canon .

In the ARPA program, the air travel planning domain has been chosen to support evaluation of spoken language systems [Pal91,Pal92,PDF92,PFFG90,PFFG93,PFF94,PFF95]. Vocabularies for these systems are usually about 2000 words. The speech and language are spontaneous , though fairly planned (since people are typically talking to a machine rather than to a person, and often use a push to talk button). The speech recognition utterance error rates in the December 1994 benchmarks was about 13% to 25%. The utterance understanding error rates range from 6% to 41%, although about 25% of the utterances are considered unevaluable in the testing paradigm, so these figures do not consider the same set [Pal91,Pal92,PDF92,PFFG90,PFFG93,PFF94,PFF95]. It may be that for limited domains, these error rates are compatible with many potential applications. Since conversational repairs  in human-human dialogue can often be in the ranges observed for these systems, the bounding factor in applications may be not the error rates so much as the ability of the system to manage and recover from errors.



next up previous contents
Next: Evaluation of Spoken Up: Spoken Language Understanding Previous: Overview



Maintained by Mike Noel and Wei Wei