
Next: Ch. 10: Transmission and Storage
Up: Survey of the
Previous: 9.6 Modality Integration: Facial
Chapter 9 References
- ACL83
-
Association for Computational Linguistics.
Proceedings of the 21st Annual Meeting of the Association for
Computational Linguistics, Cambridge, Massachusetts, 1983.
- ACL90
-
Association for Computational Linguistics.
Proceedings of the 28th Annual Meeting of the Association for
Computational Linguistics, Pittsburgh, Pennsylvania, 1990.
- ACL92
-
Association for Computational Linguistics.
Proceedings of the 30th Annual Meeting of the Association for
Computational Linguistics, University of Delaware, 1992.
- ACL93
-
Association for Computational Linguistics.
Proceedings of the 31st Annual Meeting of the Association for
Computational Linguistics, Ohio State University, 1993.
- AGvB93
-
F. D. Anger, H. W. Güsgen, and J. van Benthem, editors.
Proceedings of the IJCAI-93 Workshop on Spatial and Temporal
Reasoning (W17), Chambéry, France, August 1993.
- AL91
-
C. Abry and M. T. Lallouache.
Audibility and stability of articulatory movements: Deciphering two
experiments on anticipatory rounding in French.
In Proceedings of the 12th International Congress of Phonetic
Sciences, volume 1, pages 220--225, Aix-en-Provence, France, 1991.
- All83
-
Jonathan Allen.
Maintaining knowledge about temporal intervals.
Communications of the ACM, 26(11):832--843, 1983.
- ANL94
-
ACL.
Proceedings of the Fourth Conference on Applied Natural Language
Processing, Stuttgart, Germany, 1994. Morgan Kaufmann.
- ARP93
-
Advanced Research Projects Agency.
Proceedings of the 1993 ARPA Human Language Technology
Workshop, Princeton, New Jersey, March 1993. Morgan Kaufmann.
- Asi94
-
IEEE.
Proceedings of the 28th Asilomar Conference on Signals, Systems
and Computers, October 1994.
- AV93
-
Michel Aurnague and Laure Vieu.
A logical framework for reasoning about space.
In Anger et al. [AGvB93], pages 123--158.
- BAT92
-
Robert C. Berwick, Steven P. Abney, and Carol Tenny, editors.
Principle-Based Parsing: Computation and Psycholinguistics.
Kluwer, Dordrecht, The Netherlands, 1992.
- BB92
-
G. E. Blonder and R. A. Boie.
Capacitive moments sensing for electronic paper.
U.S. Patent 5 113 041, May 1992.
- BC94
-
G. Burdea and P. Coiffet.
Virtual Reality Technology.
John Wiley, New York, 1994.
- Bel95
-
Y. Bellik.
Interfaces Multimodales: Concepts, Modeles et Architectures.
PhD thesis, Universite d'Orsay, Paris, 1995.
- Ber93
-
N. O. Bernsen.
Modality theory: Supporting multimodal interface design.
In ERCIM, editor, Proceedings of the Workshop ERCIM on
Human-Computer Interaction, Nancy, November 1993.
- BF90
-
D. A. Berkley and J. L. Flanagan.
HuMaNet: An experimental human/machine communication network based
on ISDN.
AT&T Technical Journal, 69:87--98, September 1990.
- BHMW93
-
Christoph Bregler, Hermann Hild, Stefan Manke, and Alex Waibel.
Improving connected letter recognition by lipreading.
In Proceedings of the 1993 International Joint Conference on
Speech and Signal Processing, volume 1, pages 557--560. IEEE, April 1993.
- BJKZ85
-
R. Bajcsy, A. Joshi, E. Krotkov, and A. Zwarico.
LandScan: A natural language and computer vision system for
analyzing aerial images.
In Proceedings of the 9th International Joint Conference on
Artificial Intelligence, pages 919--921, Los Angeles, 1985.
- BL85
-
P. Bergeron and P. Lachapelle.
Controlling facial expressions and body movements in the computer
generated animated short `tony de peltrie'.
In SigGraph '85 Tutorial Notes, 1985.
- BLMA92
-
C. Benoît, M. T. Lallouache, T. Mohamadi, and C. Abry.
A set of French visemes for visual speech synthesis.
In G. Bailly and C. Benoît, editors, Talking Machines:
Theories, Models, and Designs, pages 485--504. Elsevier Science, 1992.
- BMK94
-
C. Benoît, T. Mohamadi, and S. Kandel.
Effects of phonetic context on audio-visual intelligibility of
French.
Journal of Speech and Hearing Research, 37:1195--1203, 1994.
- BOK94
-
Christoph Bregler, Stephen Omohundro, and Yochai Konig.
A hybrid approach to bimodal speech recognition.
In Asilomar [Asi94].
- Bol80
-
R. A. Bolt.
Put-that-there: Voice and gesture at the graphic interface.
Computer Graphics, 14(3):262--270, August 1980.
- BOYBJ90
-
F. Brooks, M. Ouh-Young, J. Batter, and P. Jerome.
Project GROPE: Haytic displays for scientific visualization.
Computer Graphics, 24(4):177--185, 1990.
- BPW93
-
N. I. Badler, C. B. Phillips, and B. L. Webber.
Simulating Humans: Computer Graphics Animation and Control.
Oxford University Press, New York, 1993.
- Bre82
-
J. Bresnan, editor.
The Mental Representation of Grammatical Relations.
MIT Press, Cambridge, Massachusetts, 1982.
- Bro90
-
N. Michael Brooke.
Visible speech signals: Investigating their analysis, synthesis and
perception.
In M. M. Taylor, F. Néel, and D. G. Bouwhuis, editors, The
Structure of Multimodal Dialogue. Elsevier Science, Amsterdam, 1990.
- BZ91
-
G. Burdea and J. Zhuang.
Dextrous telerobotics with force feedback.
Robotica, 9(1 & 2):171--178; 291--298, 1991.
- Che57
-
C. Cherry.
On Human Communication.
Wiley, New York, 1957.
- CM90
-
M. M. Cohen and D. W. Massaro.
Synthesis of visible speech.
Behaviour Research Methods, Instruments & Computers,
22(2):260--263, 1990.
- CM93
-
M. M. Cohen and D. W. Massaro.
Modeling coarticulation in synthetic visual speech.
In N. M. Thalmann and D. Thalmann, editors, Models and
techniques in computer animation, pages 139--156. Springer-Verlag, Tokyo,
1993.
- CNS93
-
J. Coutaz, L. Nigay, and D. Salber.
The MSM framework: A design space for multi-sensori-motor systems.
In L. Bass, J. Gornostaev, and C. Under, editors, Lecture Notes
in Computer Science, Selected Papers, EWCHI'93, East-West Human Computer
Interaction, pages 231--241. Springer-Verlag, Moscow, August 1993.
- CO71
-
W. Condon and W. Osgton.
Speech and body motion synchrony of the speaker-hearer.
In D. Horton and J. Jenkins, editors, The Perception of
Language, pages 150--184. Academic Press, 1971.
- Coh93
-
Anthony Cohn.
Modal and non-modal qualitative spatial logics.
In Anger et al. [AGvB93], pages 87--92.
- COL94
-
Proceedings of the 15th International Conference on Computational
Linguistics, Kyoto, Japan, 1994.
- COS93
-
Proceedings of the European Conference on Spatial Information Theory
(COSIT'93), volume 716 of Lecture Notes in Computer Science.
Springer-Verlag, September 1993.
- DAR91
-
Defense Advanced Research Projects Agency.
Proceedings of the Fourth DARPA Speech and Natural Language
Workshop, Pacific Grove, California, February 1991. Morgan Kaufmann.
- DPH93
-
Jr. Deller, John R., John G. Proakis, and John H. Hansen.
Discrete-Time Processing of Speech Signals.
MacMillan, 1993.
- EAC93
-
European Chapter of the Association for Computational Linguistics.
Proceedings of the Sixth Conference of the European Chapter of
the Association for Computational Linguistics, Utrecht University, The
Netherlands, 1993.
- EHSH93
-
Paul Ekman, Thomas Huang, Terrence Sejnowski, and Joseph Hager.
Final report to NSF of the planning workshop on facial expression
understanding (July 30 to August 1, 1992).
Technical report, University of California, San Francisco, March
1993.
- EL90
-
L. Erman and V. Lesser.
The Hearsay-II speech understanding system: A tutorial.
In Readings in Speech Recognition, pages 235--245. Morgan
Kaufmann, 1990.
- Erb75
-
N. P. Erber.
Auditory-visual perception of speech.
Journal of Speech and Hearing Disorders, 40:481--492, 1975.
- ESC94
-
European Speech Communication Association.
Proceedings of the Second ESCA/IEEE Workshop on Speech
Synthesis, New Paltz, New York, September 1994.
- FCF92
-
A. U. Frank, I. Campari, and U. Formentini.
Proceedings of the international conference GIS---from space to
territory: Theories and methods of spatio-temporal reasoning.
In Proceedings of the International Conference GIS---From Space
to Territory: Theories and Methods of Spatio-Temporal Reasoning, number 639
in Springer Lecture Notes in Computer Science, Pisa, Italy, September 1992.
Springer-Verlag.
- Fin86
-
Kathleen Finn.
An Investigation of Visible Lip Information to be use in
Automatic Speech Recognition.
PhD thesis, Georgetown University, 1986.
- Fis68
-
C. G. Fisher.
Confusions among visually perceived consonants.
Journal of Speech and Hearing Research, 11:796--804, 1968.
- FKK
92 -
A. G. Fraser, C. R. Kalmanek, A. E. Kaplan, W. T. Marshall, and R. C. Restrick.
XUNET 2: A nationwide testbed in high-speed networking.
In INFOCOM 92, Florence, Italy, May 1992.
- Fla92
-
J. L. Flanagan.
Technologies for multimedia information systems.
Transactions, Institute of Electronics, Information and
Communication Engineers, 75(2):164--178, February 1992.
- Fla94
-
J. L. Flanagan.
Technologies for multimedia communications.
Proceedings of the IEEE, 82(4):590--603, April 1994.
- FM93
-
S. K. Feiner and K. R. McKeown.
Automating the generation of coordinated multimedia explanations.
In Maybury [May93], pages 117--138.
- FSJ93
-
J. L. Flanagan, A. C. Surendran, and E. E. Jan.
Spatially selective sound capture for speech and audio processing.
Speech Communication, 13:207--222, 1993.
- Fur89
-
Sadaoki Furui.
Digital Speech Processing, Synthesis, and Recognition.
Marcel Dekker, New York, 1989.
- GGMCB94
-
B. Le Goff, T. Guiard-Marigny, M. Cohen, and C. Benoît.
Real-time analysis-synthesis and intelligibility of talking faces.
In ESCA [ESC94], pages 53--56.
- GGP92
-
Oscar Garcia, Alan Goldschen, and Eric Petajan.
Feature extraction for optical automatic speech recognition or
automatic lipreading.
Technical Report GWU-IIST-92-32, The George Washington University,
Department of Electrical Engineering and Computer Science, November 1992.
- GGP94
-
Alan Goldschen, Oscar Garcia, and Eric Petajan.
Continuous optical automatic speech recognition.
In Asilomar [Asi94].
- GMAB94
-
T. Guiard-Marigny, A. Adjoudani, and C. Benoît.
A 3-D model of the lips for visual speech synthesis.
In ESCA [ESC94], pages 49--52.
- Gol93
-
Alan Goldschen.
Continuous Automatic Speech Recognition by Lipreading.
PhD thesis, The George Washington University, Washington, DC, 1993.
- Her86
-
A. Herkovits.
Language and Cognition.
Cambridge University Press, New York, 1986.
- HL94
-
C. Henton and P. Litwinovitz.
Saying and seeing it with feeling: techniques for synthesizing
visible, emotional speech.
In ESCA [ESC94], pages 73--76.
- HM93
-
A. G. Hauptmann and P. McAvinney.
Gestures with speech for graphic manipulation.
International Journal of Man-Machine Studies, 38(2):231--249,
February 1993.
- HPS94
-
Marcus Hennecke, K. Prasad, and David Stork.
Using deformable templates to infer visual speech dynamics.
In Asilomar [Asi94].
- ICP93
-
Bulletin de la communication parlée, 2, January 1993.
Université Stendhal, Grenoble, France.
- Kar90
-
H. Karlgren, editor.
Proceedings of the 13th International Conference on
Computational Linguistics, Helsinki, 1990. ACL.
- Kei68
-
W. D. Keidel.
Information processing by sensory modalities in man.
In Cybernetic Problems in Bionics, pages 277--300. Gordon and
Breach, 1968.
- LA93
-
A. Lascarides and N. Asher.
Maintaining knowledge about temporal intervals.
Linguistics and Philosophy, 16(5):437--493, 1993.
- Lig93
-
Gérard Ligozat.
Models for qualitative spatial reasoning.
In Anger et al. [AGvB93], pages 35--45.
- MA94
-
M. W. Mak and W. G. Allen.
Lip-motion analysis for speech segmentation in noise.
Speech Communication, 14:279--296, 1994.
- Mas87
-
D. W. Massaro.
Speech perception by ear and eye: a paradigm for psychological
inquiry.
Lawrence Earlbaum, Hillsdale, New Jersey, 1987.
- May93
-
M. T. Maybury, editor.
Intelligent Multimedia Interfaces.
AAAI Press, Menlo Park, California, 1993.
- McD82
-
D. McDermott.
A temporal logic for reasoning about processes and plans.
Cognitive Science, 6:101--155, 1982.
- McK94
-
P. McKevitt.
The integration of natural language and vision processing.
Artificial Intelligence Review Journal, 8:1--3, 1994.
Special volume.
- MF91
-
D. M. Mark and A. U. Frank, editors.
Cognitive and Linguistic Aspects of Geographic Space,
Dordrecht, 1991. NATO Advanced Studies Institute, Kluwer.
- MM76
-
Harry McGurk and John MacDonald.
Hearing lips and seeing voices.
Nature, 264:746--748, December 23/30 1976.
- MP91
-
Kenji Mase and Alex Pentland.
Automatic lipreading by optical flow analysis.
Systems and Computer in Japan, 22(6):67--76, 1991.
- MS88
-
M. Moens and M. J. Steedman.
Temporal ontology and temporal reference.
Computational linguistics, 14(2):15--28, 1988.
- MTPT88
-
N. Magnenat-Thalmann, E. Primeau, and D. Thalmann.
Abstract muscle action procedures for human face animation.
Visual Computer, 3:290--297, 1988.
- MTS92
-
Joseph Mariani, D. Teil, and O. Da Silva.
Gesture recognition.
Technical Report LIMSI Report, Centre National de la Recherche
Scientifique, Orsay, France, December 1992.
- Nak88
-
A. Nakhimovsky.
Aspect, aspectual class, and the temporal structure of narrative.
Computational Linguistics, 14(2):29--43, 1988.
- NB93
-
B. Nebel and H-J. Bürckert.
Reasoning about temporal relations: A maximal tractable subclass of
Allen's interval algebra.
Technical Report RR-93-11, DFKI, Saarbrücken, Germany, 1993.
- Neu89
-
Bernd Neumann.
Natural language description of time-varying scenes.
In D. Waltz, editor, Semantic Structures, pages 167--207.
Lawrence Earlbaum, Hillsdale, New Jersey, 1989.
- NH88
-
A. Netravali and B. Haskel.
Digital Pictures.
Plenum Press, New York, 1988.
- Nis86
-
Nishida.
Speech recognition enhancement by lip information.
ACM SIGCHI Bulletin, 17(4):198--204, April 1986.
- Par74
-
F. I. Parke.
A parametric model for human faces.
PhD thesis, University of Utah, Department of Computer Sciences,
1974.
- PB81
-
S. M. Platt and N. I. Badler.
Animating facial expressions.
Computer Graphics, 15(3):245--252, 1981.
- PBBB88
-
Eric Petajan, Bradford Bischoff, David Bodoff, and N. Michael Brooke.
An improved automatic lipreading system to enhance speech
recognition.
CHI 88, pages 19--25, 1988.
- PBV94
-
Catherine Pelachaud, Norman Badler, and Marie-Luce Viaud.
Final report to NSF of the standards for facial animation workshop.
Technical report, University of Pennsylvania, Philadelphia, October
1994.
- Pet84
-
Eric Petajan.
Automatic Lipreading to Enhance Speech Recognition.
PhD thesis, University of Illinois at Urbana-Champaign, 1984.
- PF91
-
Christine Podilchuk and Nariman Farvardin.
Perceptually based low bit rate video coding.
In Proceedings of the 1991 International Conference on
Acoustics, Speech, and Signal Processing, volume 4, pages 2837--2840,
Toronto, May 1991. Institute of Electrical and Electronic Engineers.
- Pie61
-
J. R. Pierce.
Symbols, Signals and Noise.
Harper and Row, New York, 1961.
- PJN90
-
C. Podilchuk, N. S. Jayant, and P. Noll.
Sparse codebooks for the quantization of non-dominant sub-bands in
image coding.
In Proceedings of the 1990 International Conference on
Acoustics, Speech, and Signal Processing, pages 2101--2104, Albuquerque, New
Mexico, April 1990. Institute of Electrical and Electronic Engineers.
- PM89
-
Alex Pentland and Kenji Mase.
Lip reading: Automatic visual recognition of spoken words.
Technical Report MIT Media Lab Vision Science Technical Report 117,
Massachusetts Institute of Technology, January 1989.
- Rab89
-
L. R. Rabiner.
A tutorial on hidden Markov models and selected applications in
speech recognition.
Proceedings of the IEEE, 77(2):257--286, February 1989.
- RM94
-
Ram Rao and Russell Mersereau.
Lip modeling for visual speech recognition.
In Asilomar [Asi94].
- RMS
92 -
David B. Roe, Pedro J. Moreno, Richard W. Sproat, Fernando C. N. Pereira,
Michael D. Riley, and Alejandro Macarron.
A spoken language translator for restricted-domain context-free
languages.
Speech Communication, 11:311--319, 1992.
System demonstrated by AT&T Bell Labs and Telefonica de Espana,
VEST, Worlds Fair Exposition, Barcelona, Spain.
- RS78
-
Lawrence R. Rabiner and Ronald W. Schafer.
Digital Processing of Speech Signals.
Signal Processing. Prentice-Hall, Englewood Cliffs, New Jersey, 1978.
- Sal90
-
M. W. Salisbury.
Talk and draw: Bundling speech and graphics.
IEEE Computer, pages 59--65, August 1990.
- Sil93
-
Peter Silsbee.
Computer Lipreading for Improved Accuracy in Automatic Speech
Recognition.
PhD thesis, The University of Texas at Austin, 1993.
- Sil94
-
Peter Silsbee.
Sensory integration in audiovisual automatic speech recognition.
In Asilomar [Asi94].
- Smi89
-
Steve Smith.
Computer lip reading to augment automatic speech recognition.
Speech Tech, pages 175--181, 1989.
- SP54
-
W. H. Sumby and I. Pollack.
Visual contribution to speech intelligibility in noise.
Journal of the Acoustical Society of America, 26:212--215,
1954.
- SS93
-
J. Schirra and E. Stopp.
ANTLIMA---a listener model with mental images.
In Proceedings of the 13th International Joint Conference on
Artificial Intelligence, pages 175--180, Chambery, France, August 1993.
- STHN90
-
M. Saintourens, M. H. Tramus, H. Huitric, and M. Nahas.
Creation of a synthetic face speaking in real time with a synthetic
voice.
In Gérard Bailly and Christian Benoît, editors,
Proceedings of the First ESCA Workshop on Speech Synthesis, pages
249--252, Autrans, France, 1990. European Speech Communication Association.
- Sum79
-
Q. Summerfield.
Use of visual information for phonetic perception.
Phonetica, 36:314--331, 1979.
- Sum87
-
Quentin Summerfield.
Some preliminaries to a comprehensive account of audio-visual speech
perception.
In Barbara Dodd and Ruth Campbell, editors, Hearing by Eye: The
Psychology of Lipreading, pages 3--51. Lawrence Earlbaum, Hillsdale, New
Jersey, 1987.
- SWL92
-
David Stork, Greg Wolff, and Earl Levine.
Neural network lipreading system for improved speech recognition.
In Proceedings of the 1992 International Joint Conference on
Neural Networks, Baltimore, Maryland, 1992.
- Van86
-
C. Vandeloise.
L'espace en français: sémantique des prépositions
spatiales.
Seuil, Paris, 1986.
- Vil94
-
L. Vila.
A survey on temporal reasoning in artificial intelligence.
AICOM, 7(1):832--843, 1994.
- VY92
-
M. L. Viaud and H. Yahia.
Facial animation with wrinkles.
In Proceedings of the 3rd Workshop on Animation, Eurographic's
92, Cambridge, England, 1992.
- WAF
93 -
Wolfgang Wahlster, Elisabeth André, W. Finkler, H.-J. Profitlich, and Thomas
Rist.
Plan-based integration of natural language and graphics generation.
Artificial Intelligence, pages 387--427, 1993.
- Wah89
-
W. Wahlster.
One word says more than a thousand pictures. on the automatic
verbalization of the results of image sequence analysis systems.
Computers and Artificial Intelligence, 8:479--492, 1989.
- Wai93
-
A. Waibel.
Multimodal human-computer interaction.
In Eurospeech '93, Proceedings of the Third European Conference
on Speech Communication and Technology, volume Plenary, page 39, Berlin,
September 1993. European Speech Communication Association.
- Wat87
-
K. Waters.
A muscle model for animating three-dimensional facial expression.
In Proceedings of Computer Graphics, volume 21, pages 17--24,
1987.
- Web88
-
B. L. Webber.
Tense as discourse anaphor.
Computational linguistics, 14(2):61--73, 1988.
- WMJB83
-
W. Wahlster, H. Marburger, A. Jameson, and S. Busemann.
Over-answering yes-no questions: Extended responses in a NL
interface to a vision system.
In Proceedings of the 8th International Joint Conference on
Artificial Intelligence, pages 643--646, Karlsruhe, 1983.
- WRLG90
-
J. Wilpon, L. Rabiner, C. Lee, and E. Goldman.
Automatic recognition of key words in unconstrained speech using
hidden markov models.
IEEE Transactions on Acoustics, Speech and Signal Processing,
38(11):1870--1878, 1990.
- YGS89
-
Ben Yuhas, Moise Goldstein, and Terrence Sejnowski.
Integration of acoustic and visual speech signals using neural
networks.
IEEE Communications Magazine, pages 65--71, 1989.
- YYI
92 -
A. Yamada, T. Yamamoto, H. Ikeda, T. Nishida, and S. Doshita.
Reconstructing spatial images from natural language texts.
In Proceedings of the 14th International Conference on
Computational Linguistics, pages 1279--1283, Nantes, France, August 1992.
ACL.