Francis Quek*, Rashid Ansari**, and David McNeill+;
*Vision Interfaces and Systems Lab (VISLab), EECS Dept
The University of
Illinois at Chicago
**Signal and Image Research Lab, EECS Dept
The University of
Illinois at Chicago
+Department of Psychology
The University of Chicago
1120 SEO, M/C 154, EECS Dept.
The University of Illinois at Chicago,
851 S. Morgan
Chicago, IL 60607
Phone: (312) 996-5494
Fax :
(312) 413-0024
Email: quek@eecs.uic.edu
Other Communication Modalities.
gesture, speech, gaze, discourse segmentation, computer vision, psycholinguistics, signal processing
Our research addresses the interpretation of gesture, speech, and gaze in the context of discourse management. We shall investigate the cues afforded by each mode of interaction and the algorithms necessary to detect and extract them; study the spatial and temporal relationships among these cues and associate them with topical units in discourse; study the interactions of gesture, speech and gaze in discourse segmentation; and develop a system that integrates these elements into a coherent whole. Our approach involves experiments designed to discover and quantify cues in the various modalities, and their relation with respect to discourse management; the development of computational algorithms to detect and recognize such cues; and the integration of these cues into a cogent discourse management system.
We shall integrate this research in a hierarchical model that is both amenable to computational implementation and is reflective of human communicative realities. The base of this hierarchy is populated by atomic units, AUs in the various modalities: hand movement units for gesture, intonation units for speech, and gaze units for gaze. These AUs will be attributed with modality-specific features (e.g. hand movement extrema for gestures, fundamental frequency peak for speech, and direction for gaze). AUs will also be time-stamped for initialization and duration. In the next level of our hierarchy, these AUs will be grouped into composite units, CUs that are related to higher level discourse units. We then apply search techniques to find the most consistent segmentation across modalities based on our discourse segmentation rules.
We have assembled a strong interdisciplinary team comprising psycholinguistic, machine vision and signal processing researchers to address the scope of our proposed research. This permits us to base our research squarely on the realities of human communication in spontaneous discourse across a wide range of pragmatic conditions. Technology developed will have significant impact on natural language discourse analysis, human-computer interaction systems, and discourse and video databases. Another significant outcome of this research is to introduce computational and quantative rigor to the psycholinguistic study of discourse production. This represents a model of collaborative research between the fields of engineering and cognitive science.
D. McNeill, Hand and Mind: What Gestures Reveal About Thought, University of Chicago Press, Chicago, 1992..
Quek, F., "Unencumbered Gestural Interaction," IEEE Multimedia, Vol 4, No. 3, pp 3647, Winter, 1996.
Quek, F., and Zhao, M.D., "Inductive learning in hand pose recognition," in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Killington, Vermont, pp 7883, October, 1996.
Quek, F., "Eyes in the Interface," International Journal of Image and Vision Computing, Vol.13, Number 6, August, 1995, pp. 511-525
Quek, F., "Toward a Vision-Based Hand Gesture Interface," Proceedings of the Virtual Reality System Technology Conference, Singapore, August 23-26, 1994, pp. 17-29.
Quek F., and Bryll, R., "Vector Coherence Mapping: A parallelizable approach to image flow computation," submitted IEEE International Conference on Computer Vision, Bombay, 1998.
Our project combines research in four areas: Psycholinguistics, Computer Vision, Signal Processing, and Human Computer Interaction. While many psycholinguistic questions are yet to be answered about multimodal human discourse, there is a growing body of knowledge that may be applied to understanding how discourse may be segmented using such multimodal input. Psycholinguistics research in human speech, gesture and gaze provide information on the accoustic and visual cues that may be applied in segmenting discourse into semantic units. We also have qualitative information about how the various communication channels may cohere in time and space.
Most gestures may be divided into five dynamic components: Preparation, Pre-stroke hold, Stroke, Post-stroke hold, and Retraction. Of these, the Stroke is the most semantically significant unit. Our previous work suggests that the salient detectable components of such gestures are the locations of the stroke extrema, the hand poses at these extrema and the dynamics of the stroke between the extrema. Vision-based techniques to detect visual flow, skin color, and hand shape have direct bearing on the recognition of these components. Human gaze comprises two components: head orientation and eye fixation. Vision-based techniques for tracking such gaze is the substance of a significant part of our research. Speech accoustics provide the third cue for discourse segmentation. Our emphasis is on the surface characteristic of the speech signal, independent of the recognition of the words in the speech content. Our focus will be largely on intonation and amplitude patterns.
D. Bolinger, Intonation and Its Parts, Stanford University Press, 1986.
R. Jacob, "Eye movement-based human-computer interaction techniques," in H.R.Hartson and D. Hix, Eds,. Advances in Human-Computer Interaction, Vol. 4, pp. 151-190, Ablex Publishing Company, 1993.
Adaptive Human Interfaces, Intelligent Interactive Systems for Persons with Disabilities, Speech and Natural Language Understanding
None at this time