University of Florida
Dept. of Electrical and Computer Engineering
There are two major aspects of the proposed research. The first is to model the vibratory motion of the vocal folds. The second is to improve the modeling of, and, thereby, the synthesis of fricatives, plosives, and affricates using the articulatory speech synthesizer. The latter task will require the development of models for the excitation for these phonemes, for which we propose two approaches: spectral shaping and nonlinear dynamic modeling. Nonlinear dynamic modeling will also be included in the vocal fold modeling. Both the vocal fold model and noise source model will be used to illuminate our understanding of the cause and effect relationships between voice production features and aspects of the acoustic signal through the use of interactive, articulatory speech synthesis. The results of this research will lead to the improvement of our existing models of several voice types (modal register, vocal fry, and breathy). In addition we will be able to extend our models to other voice types (e.g., whisper, falsetto, and harsh).
The interactive vocal fold model will provide a method for viewing aspects of the three-dimensional dynamic vibratory motion of the vocal folds. This model will be a predictive model of phonatory acoustics, and a model of glottal aerodynamics. There will be a means for calculating the glottal area, the vocal tract area, and the volume velocity of airflow. The motion of the vocal folds in the model will be controlled by several methods, including sinusoidal functions and nonlinear dynamic modeling. A major motivation for this model is that imaging and other methods for measuring aspects of vocal fold vibratory motion are difficult tasks.
A unique aspect of the research is that the proposed excitation models (vocal fold model and noise source model) and the speech synthesizer will be interactive, allowing the user to adjust features of the models that are related to physiological characteristics of the speech production process. By making such parameter adjustments the researcher will be able to interactively test new hypotheses concerning voice production. Three possible applications would be training aids for the hearing impaired, methods to improve the design of speech production systems for the vocally impaired, and speech coding based on articulatory movement.
Childers, D. G., Glottal source modeling for voice conversion, Speech Communication, 16, 1995, 127-138.
Childers, D. G. and Ahn, C., Modeling the glottal volume-velocity waveform for three voice types, J. Acoust. Soc. Am., 97, 1995, 505-519
Childers, D. G. and Hu, H. T., Speech synthesis by glottal linear prediction, J. Acoust. Soc. Am., 96, 1994, 2026-2036.
P. B. Denes and E. N. Pinson, The Speech Chain, 2nd Edition, W. H. Freeman, 1993 (paperback).
L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Inc., 1978.