Postscript Version

Interactive Model of the Vocal Folds and Turbulent Noise for Speech Synthesis

Donald G. Childers

University of Florida
Dept. of Electrical and Computer Engineering

CONTACT INFORMATION

Dept. of Electrical and Computer Engineering
P. O. Box 116130,
9 Engr. Bldg., Bldg. 33
University of Florida
Gainesville, FL 32611-6130
Email: childers@drwho.ee.ufl.edu
Tel: 352-392-2633
Fax: 352-392-0044

WWW PAGE

http://www.eel.ufl.edu/~childers/

PROGRAM AREA

Speech and Natural Language Understanding

KEYWORDS

Speech synthesis, speech analysis, speech quality, vocal folds

PROJECT SUMMARY

There are two major aspects of the proposed research. The first is to model the vibratory motion of the vocal folds. The second is to improve the modeling of, and, thereby, the synthesis of fricatives, plosives, and affricates using the articulatory speech synthesizer. The latter task will require the development of models for the excitation for these phonemes, for which we propose two approaches: spectral shaping and nonlinear dynamic modeling. Nonlinear dynamic modeling will also be included in the vocal fold modeling. Both the vocal fold model and noise source model will be used to illuminate our understanding of the cause and effect relationships between voice production features and aspects of the acoustic signal through the use of interactive, articulatory speech synthesis. The results of this research will lead to the improvement of our existing models of several voice types (modal register, vocal fry, and breathy). In addition we will be able to extend our models to other voice types (e.g., whisper, falsetto, and harsh).

The interactive vocal fold model will provide a method for viewing aspects of the three-dimensional dynamic vibratory motion of the vocal folds. This model will be a predictive model of phonatory acoustics, and a model of glottal aerodynamics. There will be a means for calculating the glottal area, the vocal tract area, and the volume velocity of airflow. The motion of the vocal folds in the model will be controlled by several methods, including sinusoidal functions and nonlinear dynamic modeling. A major motivation for this model is that imaging and other methods for measuring aspects of vocal fold vibratory motion are difficult tasks.

A unique aspect of the research is that the proposed excitation models (vocal fold model and noise source model) and the speech synthesizer will be interactive, allowing the user to adjust features of the models that are related to physiological characteristics of the speech production process. By making such parameter adjustments the researcher will be able to interactively test new hypotheses concerning voice production. Three possible applications would be training aids for the hearing impaired, methods to improve the design of speech production systems for the vocally impaired, and speech coding based on articulatory movement.

PROJECT REFERENCES

Childers, D. G., Glottal source modeling for voice conversion, Speech Communication, 16, 1995, 127-138.

Childers, D. G. and Ahn, C., Modeling the glottal volume-velocity waveform for three voice types, J. Acoust. Soc. Am., 97, 1995, 505-519

Childers, D. G. and Hu, H. T., Speech synthesis by glottal linear prediction, J. Acoust. Soc. Am., 96, 1994, 2026-2036.

AREA BACKGROUND

Our research deals with the development of models of speech production. This work involves aspects of speech synthesis and speech analysis. We are addressing issues in voice conversion (to change one speaker's voice to sound like that of another) and voice creation (to create new voices as Mel Blanc did for cartoon characters), such as modeling vocal features that are related to a speaker's age, gender, emotional state, dialect, and health. The research results may be helpful for assessing vocal quality, establishing speaker normalization, and understanding aspects of speaker dependent and independent speech recognition. We are also modeling vocal fold function to assess its role in vocal quality. We have developed three speech synthesizers that have interactive, graphics users interfaces. We also have an interactive speech analysis software system to measure various aspects of the speech signal.

AREA REFERENCES

P. B. Denes and E. N. Pinson, The Speech Chain, 2nd Edition, W. H. Freeman, 1993 (paperback).

L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Inc., 1978.

RELATED PROGRAM AREAS

Adaptive Human Interfaces, Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

None at this time.