Postscript Version

Multimodal indexing, retrieval, and browsing: Combining content-based image retrieval with text retrieval

James Allan, Allen Hanson, R. Manmatha

Computer Science Department
University of Massachusetts, Amherst

CONTACT INFORMATION

Computer Science
Lederle Graduate Research Center
University of Massachusetts
Box 34610
Amherst, MA 01003-4610
Home: Phone: +1 413/545-3240
Fax : +1 413/545-1789
Email: allan@cs.umass.edu

WWW PAGE

UMass' Stimulate page

PROGRAM AREA

Other Communication Modalities

KEYWORDS

image retrieval, text retrieval, text in images, combining text and images, image and text browsing

PROJECT SUMMARY

This project will integrate innovative image matching, sophisticated text retrieval, and new query formulation methods to create a powerful platform for indexing and retrieving text, images, and video. Automatic indexing of images is an emerging technology: in some settings, a system can match images by color content, texture, and even ``appearance'', but such methods are not well understood and are limited in their scope. This project will develop new and extend existing algorithms and mechanisms that facilitate the process of indexing and retrieving text, images, and video. This project incorporates research in three areas: The culmination of the research will be a prototype search and browsing system which combines results from all three research areas. The prototype will allow a collection of images to be searched using text queries, image content as well as text in images. It could also function as a tool to help with more effective hand-annotation of an image collection.

PROJECT REFERENCES

M. Das, B. Draper, W. Lim, R. Manmatha and E. M. Riseman, " A Fast Background Independent Retrieval Strategy for Color Image Databases." Umass Technical Report, TR96-79, CIIR technical report MM-9, 1996.

M. Das, E. M. Riseman and B. Draper, " FOCUS: Searching for Multi-colored Objects in a Diverse Image Database " in the Proc. of IEEE CVPR '97, June '97,pages 756-761.

R. Manmatha, "Image matching under affine deformations." Proc. of the 27nd Asilomar IEEE Conf. on Signals, Systems and Computers, pages 106-110, 1993.

R. Manmatha, "Measuring the affine transform using gaussian filters." Proc. 3rd European Conference on Computer Vision, pages 159-164, 1994.

R. Manmatha, " Multimedia Indexing and Retrieval Research at the Center for Intelligent Information Retrieval " In the Proc. of the Symposium on Document Image Understanding Technology 1997 (SDIUT'97).

R. Manmatha and W.B. Croft, "Word spotting: Indexing handwritten manuscripts." In Mark Maybury, editor, Intelligent Multi-media Information Retrieval Collection. AAAI Press, May 1997.

R. Manmatha, Chengfeng Han, and E. M. Riseman, "Word spotting: A new approach to indexing handwriting." Proc. Computer Vision and Pattern Recognition Conference, pages 631-637, 1996.

R. Manmatha, Chengfeng Han, E. M. Riseman, and W.B. Croft, "Indexing handwriting using word matching." Digital Libraries '96: 1st ACM International Conference on Digital Libraries, pages 151-159, 1996.

R. Manmatha and S. Ravela, "A Syntactic Characterization of Appearance and its Application to Image Retrieval" in Proc. of the SPIE conf. on Human Vision and Electronic Imaging II, Vol, 3016, San Jose, CA, 1997.

S. Ravela and R. Manmatha, " Retrieving Images by Similarity of Visual Appearance" To appear in the Proc. of the IEEE Workshop on Content Based Access of Image Databases , Puerto Rico, June 20, 1997.

S. Ravela, R. Manmatha, and E.M. Riseman, "Image retrieval using scale space matching." Proc. 4th European Conference on Computer Vision, pages 273-282, 1996.

S. Ravela, R. Manmatha, and E.M. Riseman, "Scale space matching and image retrieval." Proc. DARPA Image Understanding Workshop, 1996.

Swan, R. and Allan, J., "Improving Interactive Information Retrieval Effectiveness with 3-D Graphics," CIIR Technical Report IR-100, Computer Science Dept, University of Massachusetts at Amherst, MA, 1996.

Victor Wu, R. Manmatha, and E.M. Riseman, "Extracting text from grayscale images." Technical Report CS-UM-95-88, Computer Science Dept, University of Massachusetts at Amherst, MA, 1995.

AREA BACKGROUND

The work on this project arises out of the study of Information Retrieval (IR). That field has been working to access texts that are relevant to a user without attempting a deep understanding of the language. Omitting the strong NLP/NLU means that IR techniques have been applicable in a broad range of unrestricted subject areas and across a large number of applications.

IR has generally analyzed text for low-level statistical features that capture some notion of meaning. For example, the occurrence characteristics of words and phrases (how often do they appear in a document, across all all documents, etc.) reflect some of the meaning of a text.

The CIIR has combined such IR techniques with methods in computer vision to do image retrieval: analyzing images for low-level features that do not fully capture the "meaning" of the picture, but reflect some of that meaning and allow them to be used for some types of retrieval.

AREA REFERENCES

G. Salton, Automatic Text Processing--the transformation, analysis and retrieval of information by computer. Addison-Wesley Publishing Co, Reading, MA, 1989.

G. Salton and M.J. McGill, Introduction to modern information retrieval. McGraw-Hill, New York, 1983.

RELATED PROGRAM AREAS

Speech and Natural Language Understanding
Adaptive Human Interfaces

POTENTIAL RELATED PROJECTS

We are interested in building interfaces for image and text retrieval that are appropriate for the needs of the user. A "catch-all" interface to an IR system is adequate for most purposes, but frustrating to someone with non-traditional requirements Our view is of an interface that presents a generic representation at the start, and accepts a user's query. Various optional tools monitor the result of the query and the interaction and if they can detect interesting patterns, signal that they may have something to offer. (For example, an image clustering routing might signal that it has found strong clusters in the retrieved set.) This form of adaptive interface is of great interest to us, though not directly related to our research under Stimulate.