Postscript Version
IMPROVED REAL-TIME AUDIO INTERFACE
FOR HUMAN-COMPUTER INTERACTION
Michael W. Hoffman
University of Nebraska-Lincoln
Department of Electrical Engineering
209N WSEC
Lincoln, NE 68588-0511
Phone: 402 472-1979
FAX: 402 472-4732
mail: 209N WSEC, Lincoln, NE 68588-0511
The research will examine potential solutions for a number of needs that exist for the audio interface between a human and a computer (or machine). These needs include reducing background noise and reverberation. The fact that machines also generate audio signals for the human means that the interface will need to remove this audio feedback from the input to the machine from the human. Finally, all of these needs must be satisfied in real-time to allow the human-machine interface to be of any practical use.
The results sought from the project include an enhanced speech input to a machine that will allow subsequent processing algorithms for word recognition, speaker identification, speech compression, etc., to be more effective. The approach suggested is completely general and can be used for any microphone configuration and for any type of acoustic interface between a human and a machine. Since the array picks up signals remotely the user will not be connected to the machine by a cable. The approach suggested is "cellular" in that the processor can be modified to track a user who is moving within a room by determining the user position and changing the constraints that define the robust adaptive processor to preserve signals generated within the cell that contains the user's estimated position.
Z.Li and M.W. Hoffman "Enhancing Noisy Speech for Coding with a Microphone Array." Submitted to IEEE Transactions on Speech and Audio Processing, January, 1997.
M.W. Hoffman and Z. Li "Applications of microphone arrays to speech processing." In Journal of the Acoustical Society of America, Program, Vol. 100, No.4, part 2, pg. 2696, Honolulu, HA, Fall 1996.
M. W. Hoffman "Microphone Array Calibration for Robust Adaptive Processing," In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Lake Mohonk, New Paltz, NY, Oct. 1995.
M. W. Hoffman and K. M. Buckley. "Robust time-domain processing of broadband acoustic data." IEEE Transactions on Speech and Audio Processing, Vol.3, pp. 193-203, May 1995.
The current project applies a spatial filter to provide clean acoustic signals to a machine (computer) in a noisy and reverberant environment. A voice-controlled system's reliability depends upon a clear, uncorrupted speech signal as input to the automated speech processing system. Virtually all processing algorithms designed for speech signals work better when the input speech is not corrupted by interfering noise and distortion. Speech coders, word recognition systems, and speaker identification systems are often very sensitive to background noise and reverberation. Digitally processed signals from an array of microphones provide enhanced speech input for the machine's automated processing systems. In addition, the processing that improves the speech input quality will not place hardships on the human user the system, such as a microphone attached to the machine by a cable or a precise fixed location for the human. The project attempts to advance a sophisticated interface between humans and machines that places the burden of processing and inconvenience on the machine rather than the human user.
H. Cox, R.M. Zeskind, and M.M. Owen. "Robust adaptive beamforming". IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35:1365--1376, Oct. 1987.
J.E. Greenberg and P.M. Zurek. "Evaluation of an adaptive beamforming method for hearing aids". Journal of the Acoustical Society of America, vol.91:1662--1676, Mar. 1992.
Special Session on Microphone Array Procesing, Proceedings of 1997 IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 211-254, April, 1997.
Other programs areas that are attempting to ease the burden that human-computer interactions place on the human may be able to exploit user information (such as position in a room, movement, etc.) to better anticipate the needs of the user. The microphone array interface should be able to provide some good cues as to user position, user movements and histories of such movements. Arrays of sensors allow two separate functions: signal enhancement (i.e., beamforming) and source localization (i.e., direction finding). While the primary emphasis of the current project is signal enhancement, some emphasis could be placed on exploiting the localization capacities of the sensor array.