Department of Electrical Engineering
San Jose State University
The model is based on the physical effects of sound propagation, and on neurophysiological studies that have traced the auditory pathways from the cochlea to the auditory cortex. It extends existing computational models of the cochlea by incorporating monaural and binaural, temporally-based correlation methods to extract the information needed for source localization. If successful, this work should significantly improve the abilities of computers to recognize speech or other sounds as they occur in everyday, multisource environments, thereby extending the range of effective human-machine interaction.
R. O. Duda and W. L. Martens, "Range-dependence of the HRTF for a spherical head," Proc. 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (October, 1997).
R. O. Duda, "Elevation dependence of the interaural transfer function," in R. H. Gilkey and T. R. Anderson, Eds.,Binaural and Spatial Hearing in Real and Virtual Environments, pp. 49-75 (Lawrence Erlbaum Associates, Hillsdale, NJ, 1997).
R. O. Duda, "Binaural hearing demonstrations," Acustica/Acta Acustica, Vol. 82, pp. 346-355 (March/April, 1996).
W. Chau and R. O. Duda, "Combined monaural and binaural localization of sound sources," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, November, 1995).
R. O. Duda, "Connectionist models for auditory scene analysis," in J. D. Cowan, G. Tesauro and J. Alspector, Eds., Advances in Neural Information Processing Systems -6-, pp. 1069-1076 (Morgan Kaufmann, San Francisco, 1994).
C. Lim and R. O. Duda, "Estimating the azimuth and elevation of a sound source from the output of a cochlear model," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).
T. Shawan and R. O. Duda, "Adjacent-channel inhibition in acoustic onset detection," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).
R. O. Duda, "Modeling head related transfer functions," Proc. Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 457-461 (Asilomar, CA, October, 1994).
The basis for sound localization comes from three scientific areas: acoustics, auditory neurophysiology, and psychoacoustics. The physical cues for localizing sources are captured by the so-called head-related transfer function, which measures the directional dependence of the diffraction of incident sound waves by the torso, head and outer ears. Studies of the neural pathways from the cochlea to the auditory cortex provide inspiration for both the structure of a localization model and the kinds of signal processing that are appropriate, helping to define parameters such as filter bandwidths, response times, and compressive nonlinearities to cope with dynamic range. Studies in psychoacoustics reveal the different kinds of cues that humans use to localize sources, and human abilities to deal with echoes and reverberation. Our effort involves synthesizing this information in computational models of sound localization.
J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, Revised Edition (MIT Press, Cambridge, MA , 1997). The standard reference on the psychophysics of three-dimensional hearing.
A. S. Bregman, Auditory Scene Analysis (MIT Press, Cambridge, MA, 1990). A massive description of experiments by the author and his students on the factors that influence the formation and segregation of sound streams.
S. Carlile, Virtual Auditory Space: Generation and Applications (R. G. Landes Co., Austin, TX, 1996). A lucid and valuable book of survey chapters that emphasize the physical factors that control spatial hearing.
H. L. Hawkins, T. A. McMullen, A. N. Popper and R. R. Fay, Eds., Auditory Computation, Springer-Verlag, New York, 1996. An important edited volume of papers about models of the hearing process.
J. C. Middlebrooks and D. M. Green, "Sound localization by human listeners," Annu. Rev. Psychol., Vol. 42, pp. 135-159 (1991). An excellent review of the abilities of people to localize sound. Highly recommended.
2. Sound localization for the hearing impaired. By feeding the outputs of a real-time sound localizer to appropriate tactile or visual displays, one could provide the hearing impaired with information about objects and events that are out of sight.
3. 3-D sound for teleconferencing. Spatial sound synthesis should allow a participant in a teleconference to place the sounds from other participants in different spatial locations, and should both improve intelligibility and reduce the cognitive load.