Postscript Version

A COMPUTATIONAL MODEL FOR SOUND LOCALIZATION

Richard O. Duda

Department of Electrical Engineering
San Jose State University

CONTACT INFORMATION

Department of Electrical Engineering
San Jose State University
San Jose, CA 95192
Phone: (408) 924-3917
Fax : (408) 924-3925
Email: rod@duda.org

WWW PAGE

http://www-engr.sjsu.edu/~duda

PROGRAM AREA

Other Communication Modalities

KEYWORDS

Sound localization, spatial hearing, 3-D sound, sound separation, auditory scene analysis, head-related transfer functions

PROJECT SUMMARY

The primary goal of our research is to create a model of the process by which people locate sounds in three dimensions. We are also investigating the application of our results to the synthesis of 3-D sound to enhance the human/computer interface. Our emphasis is on explaining well established but still incompletely understood psycho-acoustical phenomena. One example is our ability to locate sounds coming from above or below, despite the fact that there are no binaural differences between the sounds that reach the two ears. Another is our ability to locate sounds in reverberant environments that contain multiple sound sources, where echoes and reflections act as additional, virtual sources. A third is our ability to judge distance, since loudness alone is not an adequate cue.

The model is based on the physical effects of sound propagation, and on neurophysiological studies that have traced the auditory pathways from the cochlea to the auditory cortex. It extends existing computational models of the cochlea by incorporating monaural and binaural, temporally-based correlation methods to extract the information needed for source localization. If successful, this work should significantly improve the abilities of computers to recognize speech or other sounds as they occur in everyday, multisource environments, thereby extending the range of effective human-machine interaction.

PROJECT REFERENCES

C. P. Brown and R. O. Duda, "An efficient HRTF model for 3-D sound," Proc. 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (October, 1997).

R. O. Duda and W. L. Martens, "Range-dependence of the HRTF for a spherical head," Proc. 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics (October, 1997).

R. O. Duda, "Elevation dependence of the interaural transfer function," in R. H. Gilkey and T. R. Anderson, Eds.,Binaural and Spatial Hearing in Real and Virtual Environments, pp. 49-75 (Lawrence Erlbaum Associates, Hillsdale, NJ, 1997).

R. O. Duda, "Binaural hearing demonstrations," Acustica/Acta Acustica, Vol. 82, pp. 346-355 (March/April, 1996).

W. Chau and R. O. Duda, "Combined monaural and binaural localization of sound sources," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, November, 1995).

R. O. Duda, "Connectionist models for auditory scene analysis," in J. D. Cowan, G. Tesauro and J. Alspector, Eds., Advances in Neural Information Processing Systems -6-, pp. 1069-1076 (Morgan Kaufmann, San Francisco, 1994).

C. Lim and R. O. Duda, "Estimating the azimuth and elevation of a sound source from the output of a cochlear model," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).

T. Shawan and R. O. Duda, "Adjacent-channel inhibition in acoustic onset detection," Proc. Twenty Eighth Asilomar Conference on Signals, Systems and Computers (Asilomar, CA, October, 1994).

R. O. Duda, "Modeling head related transfer functions," Proc. Twenty-Seventh Asilomar Conference on Signals, Systems and Computers, pp. 457-461 (Asilomar, CA, October, 1994).

AREA BACKGROUND

Sound localization is part of what is called auditory scene analysis -- the decomposition of sound into source components and the characterization of the acoustic environment. On the input side, the goal is to allow computers to cope with speech and other sounds that are encountered in everyday, multi-source, reverberant environments. On the output side, the goal is to provide effective ways to synthesize realistic spatial sounds.

The basis for sound localization comes from three scientific areas: acoustics, auditory neurophysiology, and psychoacoustics. The physical cues for localizing sources are captured by the so-called head-related transfer function, which measures the directional dependence of the diffraction of incident sound waves by the torso, head and outer ears. Studies of the neural pathways from the cochlea to the auditory cortex provide inspiration for both the structure of a localization model and the kinds of signal processing that are appropriate, helping to define parameters such as filter bandwidths, response times, and compressive nonlinearities to cope with dynamic range. Studies in psychoacoustics reveal the different kinds of cues that humans use to localize sources, and human abilities to deal with echoes and reverberation. Our effort involves synthesizing this information in computational models of sound localization.

AREA REFERENCES

D. Begault, 3-D Sound for Virtual Reality and Multimedia (Academic Press, Boston, MA, 1994). An elementary but very clear presentation of 3-D audio principles and current technology.

J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, Revised Edition (MIT Press, Cambridge, MA , 1997). The standard reference on the psychophysics of three-dimensional hearing.

A. S. Bregman, Auditory Scene Analysis (MIT Press, Cambridge, MA, 1990). A massive description of experiments by the author and his students on the factors that influence the formation and segregation of sound streams.

S. Carlile, Virtual Auditory Space: Generation and Applications (R. G. Landes Co., Austin, TX, 1996). A lucid and valuable book of survey chapters that emphasize the physical factors that control spatial hearing.

H. L. Hawkins, T. A. McMullen, A. N. Popper and R. R. Fay, Eds., Auditory Computation, Springer-Verlag, New York, 1996. An important edited volume of papers about models of the hearing process.

J. C. Middlebrooks and D. M. Green, "Sound localization by human listeners," Annu. Rev. Psychol., Vol. 42, pp. 135-159 (1991). An excellent review of the abilities of people to localize sound. Highly recommended.

RELATED PROGRAM AREAS

Virtual Environments, Adaptive Human Interfaces, Intelligent Interactive Systems for Persons with Disabilities.

POTENTIAL RELATED PROJECTS

1. Speech recognition in natural, multisource environments. Early work by Weintraub and more recent work by Bodden and Blauert suggests that the techniques of auditory scene analysis, though computationally expensive, have the potential to cope with interfering sound for which a random-noise model is inadequate.

2. Sound localization for the hearing impaired. By feeding the outputs of a real-time sound localizer to appropriate tactile or visual displays, one could provide the hearing impaired with information about objects and events that are out of sight.

3. 3-D sound for teleconferencing. Spatial sound synthesis should allow a participant in a teleconference to place the sounds from other participants in different spatial locations, and should both improve intelligibility and reduce the cognitive load.