This section aims to provide interesting and informative tutorials and demonstrations on the way we're approaching speech recognition and synthesis here at CSLU.
The section on building spoken dialogue systems gives a complete tutorial of the CSLU Toolkit's Rapid Application Developer (RAD). RAD is the Toolkit's high-level application developer. RAD's easy-to-use graphical authoring environment enables users to rapidly design and test spoken dialogue systems. It seamlessly integrates the core technologies of facial animation, speech recognition and understanding, and speech synthesis with other useful features such as word-spotting, barge-in, dialogue repair, telephone and microphone interfaces, and open-microphone capability.
The section on natural-language understanding gives a tutorial on our robust natural-language parser called PROFER (Predictive, RObust, Finite-state parsER).
The section on spectrogram reading illustrates that it is possible to "read" speech using a visual display of the speech signal. This provides a basis for understanding the spectral-domain features used by nearly all speech recognition systems, as well as illustrating why speech recognition is considered so difficult.
The section on speech recognition contains several tutorials. The first is on how neural-network based speech recognition is performed. Next, there is an HTML-based tutorial on how to build a hybrid HMM/ANN based speech recognizer using the Toolkit. There is also a postscript-format tutorial on how to build either an HMM-based or a HMM/ANN-based speech recognizer using the Toolkit. Data for going through these tutorials is available in a ZIP file, and data files necessary for going through the postscript-format tutorial are split into two files: one for the files necessary for the HMM tutorial and the other for the files necessary for the HMM/ANN tutorial.
The section on Text-to-Speech provides a link to the text-to-speech (TTS) demos developed by the CSLU TTS group, which include an interactive speech synthesis demo, where the user can type in any text and hear the synthesizer's output, the Singing Voice Synthesis demo, and a demonstration of voice conversion.
We also encourage researchers to check out PSL Tools, developed by the Perceptual Sciences Laboratory at the University of California, Santa Cruz. PSL Tools provides a set of tools for designing, conducting and analyzing the results of perceptual experiments. It allows users to manipulate auditory and visual stimuli for perceptual experiments; design interactive protocols for multi-media data presentation and multi-modal data capture; transcribe and analyze subjects' responses; perform statistical analyses; and summarize and display results using the Toolkit's visualization tools.