Postscript Version

AN ENVIRONMENT FOR ILLUSTRATED BRIEFING AND FOLLOW-UP SEARCH OVER LIVE MULTIMEDIA INFORMATION

Alfred V. Aho (*), Shih-Fu Chang (**), Kathleen R. McKeown (*)

(*) Department of Computer Science

(**) Department of Electrical Engineering

Columbia University

CONTACT INFORMATION

Kathleen R. McKeown
1214 Amsterdam
450 Computer Science Building
Department of Computer Science
New York, N.Y. 10027
Phone: (212) 939-7118
Fax : (212) 666-0140
Email: kathy@cs.columbia.edu

WWW PAGE

http://www.cs.columbia.edu/~radev/stimulate

PROGRAM AREA

Virtual Environments

KEYWORDS

summarization; multimedia search and manipulation; event tracking; illustrated briefings; multimedia integration; content-based visual search.

PROJECT SUMMARY

In today's online world of constantly changing information, people must filter large quantities of multimedia information on a daily basis. Our research focuses on the development of technologies to aid people in finding the information they seek. We envision an environment that provides up-to-the-minute briefings on topics of interest, linking the user into an integrated collection of related multimedia documents.

Dependent upon a user profile or query, multimedia information will be filtered to those that match the user interests and a summary of the set of matching documents will be automatically generated to provide the user with an overview of information contained within. A representative set of images or video that match user interests will also be provided as well as hypertext links into the documents themselves. The user can follow up with multimedia queries to in turn refine the search for related images, video, or additional articles.

Our work will have three main components:

Integration of multiple media in all components is a focus of our work.

A key feature of our work on summarization is the generation of summaries over multiple articles on the same event. Our research will develop techniques to present multiple viewpoints and changes in perception over time, to update a user on live sources of data since the last summary was received, and merging of information from both textual and non-textual sources. Input will include live news sources, on-line databases and ontologies. Research focuses on the problem of generation, assuming that information extracted from different sources will be given as input. In particular, we will develop planning operators that determine how to link information from different articles and sources as well as lexical constraints that can be used in determining how to convey information concisely. Summarization will initially be restricted to the domain of terrorist news articles, using information extraction systems developed under the Message Understanding Conferences organized by DARPA, a domain where both information extraction is possible and a variety of online images in the same domain can be found.

Our work on multimedia search and manipulation will focus on three aspects. First, we will develop fully automated techniques for extracting effective visual features and localized image objects from both compressed and uncompressed images/video. Fusion of multiple features will be used to support powerful tools for searching visual materials according to their visual content. The compressed domain approach will enable highly efficient implementation due to the greatly reduced data rate in the compressed domain and elimination of expensive decoding. We will also design efficient indexing methods for including flexible spatial query of arbitrarily-shaped image objects. Second, we will investigate robust feature clustering techniques, using both visual features and textual features, for automated image/video subject catalogueing. Learning algorithms through user interaction will be used to enhance classification accuracy and adaptively select optimal features. An enhanced image taxonomy, based on our current working system, will be developed using semi-automatic methods. Third, content search techniques and efficient manipulation techniques will be applied to compressed video to support ubiquitous, real-time multimedia access and manipulation, such as that desired by mobile journalists reporting breaking news in the field.

Resources of different media will be integrated at all levels. This will be used both to aid in the problem of multimedia search and to augment summarization, resulting in illustrated briefings. Textual material generated by summarization will be used to enhance multimedia search, navigation, and catalogueing. Conversely, image categorization tools will be used to cluster the set of images retrieved in a search, so that representative images from each category can be selected to illustrate the generated textual summary.

Matching results from content-based visual search may identify repetitive or relevant images which will invoke automated summarization of multiple documents linking to the images.

Our research in the area of event tracking will focus on two problems: use of both textual and visual features to initially spot an event of interest and techniques to track documents on the same event over time, detecting surface differences between the documents. We will rely on online available information retrieval engines, text categorization techniques, along with our visual search system for the problem of spotting an event of interest, focusing on the use of different forms of patterns, from key words, grammatical structures, to visual features, over different information associated with the input documents, to aid in this problem.

PROJECT REFERENCES

A. Aho, S.-F. Chang, K. McKeown, D. Radev, J. Smith, and K. Zaman, "Columbia Digital News Systems," IEEE International Conference on the Advances in Digital Libraries, Washington D.C., May 1997.
PostScript File

S.-F. Chang, "Exploring Functionalities In The Image/Video Compressed Domain ," ACM Computing Surveys, Volume 27, Number 4, December 1995, pp. 573-575.

S.-F. Chang, J. R. Smith, H. J. Meng, H. Wang, and D. Zhong, "Finding Images/Video in Large Archives- Columbia's Content-Based Visual Query Projects," CNRI Digital Library Magazine, Feb. 1997.
html file

Kathleen R. McKeown, Karen Kukich, and Jacques Robin. Generating concise natural language summaries. Journal of Information Processing and Mangement, 31(5), September 1995, pp. 703-733, Special Issue on Summarization.

Kathleen R. McKeown and Dragomir Radev. Generating summaries of multiple news articles. In Proceedings of SIGIR, July 1995. Seattle, Washington.
PostScript File

Jacques Robin and Kathleen R. McKeown. Empirically designing and evaluating a new revision-based model for summary generation. Artificial Intelligence Journal, 85, August 1996, Special Issue on Empirical Methods.

Dragomir R. Radev and Kathleen R. McKeown. Building a Generation Knowledge Source using Internet-Accessible Newswire. In Proceedings of the Conference on Applied Natural Language Processing, April 1997, Washington, D.C.
PostScript File
Demo

J. Meng, D. Zhong, and S.-F. Chang, "WebClip: A WWW Video Editing/Browsing System," IEEE 1st Multimedia Signal Processing Workshop, June 1997, Princeton, NJ.
html file
Demo

J. R. Smith and S.-F. Chang, "VisualSEEk: A Fully Automated Content-Based Image Query System," ACM Multimedia Conference, Boston, MA, Nov. 1996.
ps file
Demo

J. R. Smith and S.-F. Chang, "Searching for Images and Videos on the World-Wide Web," to appear in IEEE Multimedia Magazine, Summer, 1997. (also Columbia U. CU/CTR Technical Report #459-96-25).
ps file
Demo

AREA BACKGROUND

Today's online world contains a myriad of information, constantly changing and evolving. Effective information filtering in such environments according to user interests is a challenging task.

Most work in summarization, image search, and event tracking has been done independently. Research on summarization to date has focused the use of statistical techniques to extract key sentences which can serve as a summary. Research on image search includes traditional approaches using keywords and new content-based approaches using multimedia features (image/video, audio, and text). Related research in event tracking includes on-line text categorization and text filtering.

Summarization, image search, and event tracking falls within three main areas of research: natural language generation, visual information search, and information retrieval.

Our research will build on existing techniques to extract information from text, augmenting existing information extraction systems developed under the DARPA message understanding program. Using information extracted from such systems as input to our system, we will focus on the use of language generation techniques to plan the content and wording of the generated summary. A language generation system typically consists of two modules, a content planner and a surface generator. A content planner determines what information to include in a generated text and reasons at the conceptual level. For this research, we will use content planning operators to identify similarities and differences across information extracted from multiple articles. A surface generator takes the information to be communicated and determines the words to use and their linear ordering in a text. Our focus here will be on re-use of words and phrases extracted from the input text in the generation process, along with new words and phrases, to produce a concisely worded summary.

In the area of image search, we pursue research known as content based visual search. In the past, images and videos are indexed by manually assigned keywords, meta data, or classified to proprietary domain taxonomies. Content-based image search complements the text-based approaches with automatic indexing of visual feature (e.g, color, shape, texture, spatio-temporal structures of objects), and other multimedia features (such as speech, audio, associated text). Users issue image queries by giving examples, drawing visual sketches, or giving natural language input (such as keywords and speeches). The search systems find matched images and videos which have statistical and/or structural features similar to the query input. Models of high-level semantic objects (such as human portraits) can also be built using the multimedia features and their spatio-temporal constraints in order to automatically classify new images/video from on-line sources.

Our research in event tracking focuses on information sources on the World Wide Web. Our goal is to effectively detect changes in a Web user's information sources of choice and provide appropriate information about these changes in a timely and comprehensible manner. Many of the existing systems used to detect changes in Web pages focus on providing the information that a change has taken place and not on the nature of the change itself. Some work has been carried out where the changes have been represented as markups in a displayed document but this work does not exploit the fact that the HTML source can be more appropriately be modelled hierarchically as a tree and that not all changes in a Web page would be of equal interest to a user. Work has also been done in detecting changes in structured documents but this work has dealt primarily with algorithmic issues rather than a specific solution for detecting differences in Web pages. Our work exploits the fact that Web pages have different functionalities and that these functionalities can be captured by a structural analysis of the document. The changes of interest in a Web page depend upon their functionality. We focus on classifying a given Web page into an ontology based on its functionalities and concentrating on detecting the changes of importance corresponding to the node in the ontology. We model the HTML document as a tree and make use of dynamic programming techniques to efficiently compute the changes.

AREA REFERENCES

S.-F. Chang, "Content-Based Indexing and Retrieval of Visual Information" IEEE Signal Processing Magazine, July 1997.

Amarnath Gupta and Ramesh Jain, "Visual Information Retrieval," Communications of ACM, May 1997, pp. 70-79, Vol. 40, No. 5.

R. K. Srihari, "Automatic Indexing and Content-Based Retrieval of Captioned Images", IEEE Computer Magazine, Sep. 1995, Vol 28, No 9, pp. 49-58.

Special Issue on Content-Based Image Query, IEEE Computer Magazine, Sep. 1995, Vol.28, No.9.

Kathleen R. McKeown. Text generation: Using discourse strategies and focus constraints to generate natural language text. Cambridge University Press, Cambridge, England, 1985.

Paice, Chris D. Constructing Literature Abstracts by Computer: Techniques and Prospects, Information Processing and Management, Vol 26, 1990, pp. 171-186.

Proceedings of any of the DARPA Message Understanding Conferences; for example, Proceedings of the Fourth Message Understanding Conference (MUC-4)", DARPA Software and Intelligent Systems Technology Office, 1992.

Sparck-Jones, Karen. What Might Be In A Summary? Proceedings of Information Retrieval 93: Von der Modellierung zur Anwendung, Universitatsverlag Konstanz, 1993, pp. 9-26.

Url-minder. http://www.netmind.com/URL-minder/URL-minder.html

Fred Douglis and Thomas Ball, "Tracking and Viewing Changes on the Web", 1996 Usenix Technical Conference.

S. Chawathe and H. Garcia-Molina, "Meaningful change detection in structured data", Proceedings of the ACM SIGMOD International Conference on Management of data, Tucson, Arizona, May 1997.

K. Zhang and D. Shasha, "Simple fast algorithms for the editing distance between trees and related problems", SIAM Journal of Computing, 18(6):1245-1262,1989.

D.S. Hirschberg, "Algorithms for the longest common subsequence problem", Journal of the ACM, 24(4):644-675, October 1977.

WebClassify. http://www.cs.columbia.edu/~zkazi/cgi-bin/classify/tester.html

RELATED PROGRAM AREAS

Adaptive Human Interfaces; Speech and Natural Language Understanding; Other Communication Modalities.

POTENTIAL RELATED PROJECTS

Our project could benefit from interaction with others who are working on information extraction. As we work on summarization, it is clear that the information extraction systems developed under the DARPA message understanding program could be enhanced to provide information suitable for this new task. For example, for summarization it is helpful to know the sources of each article. This is information that is not currently extracted but could easily be added in. In addition, it would be helpful to have tools that allow us to build and experiment with our own patterns for information extraction so that we can easily augment and modify existing systems. In general, working in close collaboration with a project that focuses on information extraction could be beneficial both to us and to the information extraction team. Interaction with others who are working audio-based multimedia indexing will be beneficial to our work on image/video search, which has a primary focus on visual and textual aspects.