Font Size: a A A

Narrowing down the semantic gap between content and context using multimodal keywords

Posted on:2010-07-13Degree:Ph.DType:Dissertation
University:Wayne State UniversityCandidate:Agrawal, RajeevFull Text:PDF
GTID:1448390002476934Subject:Computer Science
Abstract/Summary:
Conventional information retrieval is based solely on text, and these approaches to textual information retrieval have been transplanted into image retrieval in a variety of ways, including the representation of an image as a vector of feature values of different modalities. It has been widely recognized that the family of image retrieval techniques should become an integration of different modalities, such as color, texture and text features. Although efforts have been devoted to combining these aspects of multimodal data, the gap between them is still a huge barrier in front of researchers.;We propose an approach to narrow down the gap between low level image features (content) and the human interpretation of the image (context). To take the cue from text-based retrieval techniques, we construct "visual keywords" using vector quantization of small sized image tiles. Both visual and text keywords are combined and used to represent an image as a single multimodal vector. This multimodal image vector is similar to a term vector in text document representation and helps in unfolding the hidden inherent relationships between image to image, text to text and text to image. We demonstrate the power of these visual keywords for image clustering, when used in tandem with textual keyword annotations, in the context of latent semantic analysis, a popular technique in classical information retrieval which has been used to reveal the underlying semantic structure of document collections.;We present a diffusion kernel based graph-theoretic non-linear approach to identify the modality relationship between visual and text modalities. By comparing the performance of this approach with low-level features based approach, we demonstrate that the visual keywords, when combined with the textual keywords, improve the image clustering retrieval results significantly.;We also present a Bayesian probability based framework, which uses visual keywords and already available text keywords to automatically annotate images. We estimate the conditional probability of a text keyword in the presence of visual keywords, described by a multivariate Gaussian distribution. We demonstrate the effectiveness of our approach by comparing predicted text annotations with manual annotations and analyze the effect of text annotation length on the performance.
Keywords/Search Tags:Text, Image, Keywords, Approach, Retrieval, Multimodal, Gap, Semantic
Related items