This book is the work of many different individuals whose common bond is the love for the understanding and use of spoken language between humans and with machines. I was fortunate enough to have been included in this community through the work of one of my students, Alan Goldschen, who brought to my attention almost a decade ago the intriguing problem of lipreading. Our unfinished quest for a machine which could recognize speech more robustly via acoustic and optical channels was my original motivation for entering the wide world of spoken language research so richly exemplified in this book.
I have been credited with producing the small spark which began this truly joint international work via a small National Science Foundation (NSF) award, and a parallel one abroad, while I was a rotating program officer in the Computer and Information Science and Engineering Directorate. We should remember that the International Division of NSF also contributed to the work of U.S. researchers, as did the European Commission for others in Europe. The spark occurred at a dinner meeting convened by George Doddington, then of ARPA, during the 1993 Human Language Technology Workshop at the Merril Lynch Conference Center in New Jersey. I made the casual remark to Antonio Zampolli that I thought it would be interesting and important to summarize, in a unifying piece of work, the most significant research taking place worldwide in this field. Mark Liberman, present at the dinner, was also very receptive to the concept. Zampolli heartily endorsed the idea and took it to Nino Varile of the European Commission's DG XIII. I did the same and presented it to my boss at the NSF, the very supportive Y. T. Chien, and we proceeded to recruit some likely suspects for the enormous job ahead. Both Nino and Y. T. were infected with the enthusiasm to see this work done. The rest is history, mostly punctuated by fascinating "editorial board" meetings and the gentle but unforgiving prodding of Ron Cole. Victor Zue was, on my side, a pillar of technical strength and a superb taskmaster. Among the European contributors who distinguished themselves most in the work, and there were several including Annie Zaenen and Hans Uszkoreit, from my perspective, it was Joseph Mariani with his group at the Human-Machine Communication at LIMSI/CNRS, who brought to my attention the tip of the enormous iceberg of research in Europe on speech and language, making it obvious to me that the state-of-the-art survey must be done.
From a broad perspective point of view it is not surprising that this daunting task has taken so much effort: witness the wide range of topics related to language research ranging from generation and perception to higher level cognitive functions. The thirteen chapters that have been produced are a testimony of the depth and width of research that is necessary to advance the field. I feel gratified by the contributions of people with such a variety of backgrounds and I feel particularly happy that Computer Scientists and Engineers are becoming more aware of this, making significant contributions. But in spite of the excellent work done in reporting, the real task ahead remains: the deployment of reliable and robust systems which are usable in a broad range of applications, or as I like to call it "the cosumerization of speech technology." I personally consider the spoken language challenge one of the most difficult problems among the scientific and engineering inquiries of our time, but one that has an enormous reward to be received. Gordon Bell, of computer architecture fame, once confided that he had looked at the problem, thought it inordinately difficult, and moved on to work in other areas. Perhaps this survey will motivate new Gordon Bell's to dig deeper into research in human language technology.
Finally, I would like to encourage any young researcher reading this survey to plunge into the areas of most significance to them, but in an unconventional and brash manner, as I feel we did in our work in lipreading. Deep knowledge of the subject is, of course, necessary but the boundaries of the classical work should not be limiting. I feel strongly that there is need and room for new and unorthodox approaches to human-computer dialog that will reap enormous rewards. With the advent of world-wide networked graphical interfaces there is no reason for not including the speech interactive modality in it, at great benefit and relatively low cost. These network interfaces may further erode the international barriers which travel and other means of communications have obviously started to tear down. Interfacing with computers sheds much light on how humans interact with each other, something that spoken language research has taught us.
The small NSF grant to Ron Cole, I feel, has paid magnified results. The
resources of the original sponsors have been generously extended by those of
the Center for Spoken Language Understanding at the Oregon Graduate
Institute, and their personnel, as well as by the University of Pisa. From
an ex-program officer's point of view in the IRIS Division at NSF this grant
has paid great dividends to the scientific community. We owe an accolade to
the principal investigator's Herculean efforts and to his cohorts at home
and abroad.