Jack Mostow
Robotics Institute, Language Technologies Institute, and Human-Computer
Interaction Institute, Carnegie Mellon University
Co-PI: Maxine Eskenazi
Language Technologies Institute, Carnegie Mellon University
Project LISTEN, 215 Cyert Hall
CMU-LTI, 4910 Forbes Avenue
Pittsburgh, PA 15213-3734
Phone: (412) 268-1330
Fax : (412) 268-6298
Dr. Maxine Eskenazi (max@cs.cmu.edu) - recording, labeling, database structure, final data processing
Dr. Jack Mostow (mostow@cs.cmu.edu) - forced alignment, automatic generation of files, children and reading
Children's speech database, Project LISTEN, continuous speech recognition, oral reading
Proposal Summary
The objective of this project is to create a database of children's read speech with the digital quality needed for speech research. Such data is unavailable today, but essential to achieving effective speech communication between children and machines.
The methods to be employed include selecting appropriate texts to be read, recording good and poor readers, labelling the speech, verifying the quality of the labels, and organizing the data into an easily accessible form.
A database of children's speech is a key enabling condition for applications in education, entertainment, and other socially and economically important fields. Expected direct impact of the work includes making the database available to the Linguistic Data Consortium for distribution to speech researchers.
Applications include an automated reading coach that listens to children read aloud, and helps when needed. A prototype of such a coach was developed with previous NSF support, but its usability and robustness were limited because its speech recognizer was trained on adult speech. A children's speech database is necessary to achieve the required accuracy in listening.
Brief Summary of Progress to Date and Work Still to be Performed
This database is comprised of sentences read aloud by children. The children range in age from 6 to 11 and were mostly in first through third grades at the time of recording. There were 24 male and 52 female speakers. There are 5180 utterances in all.
As of June, 1997, the database is undergoing final verification prior to CD-ROM publication of the corpus by the Linguistic Data Consortium at the University of Pennsylvania.
David Graff at the LDC provided assistance in adapting the data files and organizing the corpus for publication; he also created or regenerated the various table files to be consistent with the published form of the corpus, and contributed to the documentation.
M. Eskenazi. KIDS: A database of children's speech. J. Acoust. Soc. Am.100:4, part 2, December, 1996. Abstract: We have collected a database of children reading age- and reading-level-appropriate text aloud. This (labelled) data, to be distributed in the near future, was primarily intended to be used in CMU's LISTEN tutor which employs speech recognition to monitor children's reading and then help correct errors. The speaker population was therefore chosen to represent good and poor readers and to incorporate dialects of the speakers for whom the reading coach is intended. Phonemic balance could not be achieved (although it has been calculated) since the primary concern in recording children reading is to present sentences that can effectively be read by first through third graders. The text is a series of sentences we adapted from text in the Weekly Reader series - most of the adaptation concerned the lack of the accompanying images. The text was chosen for its intrinsic interest and widespread use. Several trial recording sessions allowed us to develop a protocol that kept extraneous noises produced by the children at a minimum. We will discuss this and other problems inherent in recording children reading. Novel techniques developed for labelling this kind of speech will also be presented. This work was funded by NSF Grant No. IRI-9528984.
A corpus of transcribed children's speech is essential to train speech recognizers to perform accurately on children's speech for applications such as Project LISTEN's Reading Tutor. However, children's speech has been virtually absent from publicly available speech databases of the size and quality required.
J. Mostow, S. Roth, A. G. Hauptmann, and M. Kane. A Prototype Reading Coach that Listens. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), pages 785-792. American Association for Artificial Intelligence, Seattle, WA, August, 1994. Recipient of the AAAI-94 Outstanding Paper Award.
A. Waibel and K.-F. Lee. Readings in Speech Recognition. Morgan Kaufmann, San Mateo, CA, 1990.
Use children's speech to study characteristics of children's speech, or to train speech recognizers for other applications.