Evaluation of spoken language understanding systems (see chapter 13) is required to estimate the state of the art objectively. However, evaluation itself has been one of the challenges of spoken language understanding. A brief survey of spoken language understanding work in the Europe, Japan and the U.S. is surveyed briefly below, and evaluation will be discussed in the following section.
Several sites in Canada, Europe and Japan have been researching spoken language understanding systems, including INRS in Canada, LIMSI in France, KTH in Sweden, the Center for Language Technology in Denmark, SRI International and DRA in the UK, Toshiba in Japan. The five year ESPRIT SUNDIAL project, which concluded in August 1993, involved several sites and the development of prototypes for train timetable queries in German and Italian and flight queries in English and French . All these systems are described in articles in [Eur93]. The special issue of Speech Communication on Spoken Dialogue [SF94], also includes several system descriptions, including those from NTT , MIT , Toshiba , and Canon .
In the ARPA program, the air travel planning domain has been chosen to
support evaluation of spoken language systems
[Pal91,Pal92,PDF
92,PFFG90,PFFG93,PFF
94,PFF
95].
Vocabularies for these systems are usually about 2000
words. The speech and language are spontaneous , though fairly planned
(since people are typically talking to a machine rather than to a
person, and often use a push to talk button). The speech
recognition utterance error rates in the December 1994 benchmarks was
about 13% to 25%. The utterance understanding error rates range from
6% to 41%, although about 25% of the utterances are considered
unevaluable in the testing paradigm, so these figures do not
consider the same set [Pal91,Pal92,PDF
92,PFFG90,PFFG93,PFF
94,PFF
95].
It may be that for limited domains, these error rates are compatible
with many potential applications. Since conversational repairs in
human-human dialogue can often be in the ranges observed for these
systems, the bounding factor in applications may be not the error
rates so much as the ability of the system to manage and recover from
errors.