Narrowing down the semantic gap between content and context using multimodal keywords

Posted on:2010-07-13

Degree:Ph.D

Type:Dissertation

University:Wayne State University

Candidate:Agrawal, Rajeev

Full Text:PDF

GTID:1448390002476934

Subject:Computer Science

Abstract/Summary:

Conventional information retrieval is based solely on text, and these approaches to textual information retrieval have been transplanted into image retrieval in a variety of ways, including the representation of an image as a vector of feature values of different modalities. It has been widely recognized that the family of image retrieval techniques should become an integration of different modalities, such as color, texture and text features. Although efforts have been devoted to combining these aspects of multimodal data, the gap between them is still a huge barrier in front of researchers.;We propose an approach to narrow down the gap between low level image features (content) and the human interpretation of the image (context). To take the cue from text-based retrieval techniques, we construct "visual keywords" using vector quantization of small sized image tiles. Both visual and text keywords are combined and used to represent an image as a single multimodal vector. This multimodal image vector is similar to a term vector in text document representation and helps in unfolding the hidden inherent relationships between image to image, text to text and text to image. We demonstrate the power of these visual keywords for image clustering, when used in tandem with textual keyword annotations, in the context of latent semantic analysis, a popular technique in classical information retrieval which has been used to reveal the underlying semantic structure of document collections.;We present a diffusion kernel based graph-theoretic non-linear approach to identify the modality relationship between visual and text modalities. By comparing the performance of this approach with low-level features based approach, we demonstrate that the visual keywords, when combined with the textual keywords, improve the image clustering retrieval results significantly.;We also present a Bayesian probability based framework, which uses visual keywords and already available text keywords to automatically annotate images. We estimate the conditional probability of a text keyword in the presence of visual keywords, described by a multivariate Gaussian distribution. We demonstrate the effectiveness of our approach by comparing predicted text annotations with manual annotations and analyze the effect of text annotation length on the performance.

Keywords/Search Tags:

Text, Image, Keywords, Approach, Retrieval, Multimodal, Gap, Semantic

Related items

1	Research On Keywords Extraction From Weibos Based On Semantic Association Between Image And Text
2	Study And Implementation On Multimodal Commodity Searching For Online Shopping
3	Research On Multimodal Training For Image And Text Retrieval
4	Integrating Textual Semantic And Visual Content For Web Personal Image Retrieval
5	Image-Text Retrieval Based On Hierarchical Interaction Network
6	Text Retrieval Based On Keywords
7	Research On Domain Ontology-Based Semantic Retrieval
8	Research On Multi-strategy Keywords Extraction And Quick Text Classification
9	Research On Key Technologies Of Semantic Retrieval Based On Multimodal Data
10	Research On Outsourcing Data Security Retrieval Technology Based On Keywords