Font Size: a A A

Visual Language Analysis: From Low Level Feature Representation To Semantic Distance Learning

Posted on:2011-12-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WuFull Text:PDF
GTID:1118360305466635Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of Internet, there is proliferation of Web image resources. Lots of research problems come along with Web images, such as image annotation, image retrieval, search result clustering, near-duplicate detection, image tag recom-mendation, image indexing, image classification, object detection, etc. All these search topics have to deal with one intrinsic and fundamental problem, that is visual seman-tic representation and measurements. Thus this problem has become a hot research direction for both academy and industry.Currently, visual semantic representation and measurements can be divided into four elements, including image representation, image similarity measurement, concept representation and concept correlation measurement. Image representation refers to im-age feature and the arrangement of features. The feature types could be various and so is the arrangements. Image similarity measurement is based on the image representa-tion, and could be generated by machine learning technology. Similarity measurement could be quite different by choosing different image features and the representations. Concept representation refers to the features of concept and their arrangement. The features of concept can be generated from a collection of images related to the concept. Currently there are lots of successful concept modeling methods, such as 2D hidden Markov model, conditional random fields, etc. Comparing to these complex models, we propose the visual language model, which is simple and effective. We also propose the semantic preserving bag of words model to help solve the semantic gap problem. The correlation between concepts is based on the concept models. As far as we know, there are several concept distance measurements, such as WordNet distance, google distance, etc. These distance are based on human labor or text information, while our proposed Flickr distance measures the concept distance based on the visual informa-tion.This thesis proposes a series of models and methods to solve visual semantic rep-resentation and measurements. The contributions are not only on low level feature and representation, but also on high level model and measurements. The published papers cover all four elements of the research problem. It provides exploring report on the vi-sual semantic representation and measurements. The concrete contributions are listed as follows: 1. We propose the visual language model (VLM) to bridge the gap be-tween text analysis and the visual analysis. We believe that the local visual features follow certain grammar, which is similar the the words in text documents. By ana-lyzing the local features, we can estimate the semantic in an image. Since this model is similar to the language model in text analysis, lots of similar techniques can be ex-tended. Experimental results show that the model is effective and much faster than other complex models.2. We propose the semantic preserving bag of words model to handle the semantic gap problem. We propose a novel measurement of the semantic gap, and try to find a best mapping space to translate the visual features to visual words that minimizing the semantic gap. In this way, we can better learn a dictionary with better discrim-ination. Experiments show that the optimal dictionary can significantly improve the performance of image annotation.3. We propose probabilistic relevance component analysis method (pRCA) to help improve image similarity measurement. pRCA represents the side information between images in a probabilistic form rather than the binary form, to help improve distance learning. Experiments on Web image annotation show that the method is much better in accuracy and efficiency than other distance metric learning methods.4. We propose a visual information based concept distance measurement, called Flickr distance. We believe correlated concepts have better chance to appear in the same image. Thus we can effectively measure the distance between concepts by the difference between their visual language models. Different from other text based con-cept distance measurements, Flickr distance adopts the visual information related to the concept. It can be effective for multimedia related tasks and is more consistent to human cognition.5. We extend the traditional linear distance metric learning to the non-linear dis-tance function learning by proposing Bregman distance function learning. The tra-ditional Mahalanobis distance aims to learn a distance matrix, which is static for the whole sample space. Since the distribution of samples is quite different, it makes sense to consider local information of the sample distribution by adopting the non-learning distance function. Experiments show that the proposed Bregman distance can better handle distance learning problems in high dimensional space.6. We extend the statistical distance measurement to dynamic distance measure-ment by proposing the QOSS subspace shifting method. We believe distance can be quite different in different metric space. In order to judge whether the two samples are similar or not, it is better to measure their distance in multiple spaces rather than simple space. The proposed method can automatically choose subspace for distance measure-ments. Experiments on Web image near duplicate detection show that our method can converge in less than 5 iterations and the detection precision can be significantly im-proved.
Keywords/Search Tags:Visual language model, visual analysis, concept representation, semantic modeling, distance metric learning, image annotation, tag recommendation
PDF Full Text Request
Related items