Font Size: a A A

Research On Image Semantic Representation And Metric Learning Technologies

Posted on:2017-05-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W ZhaoFull Text:PDF
GTID:1108330482979084Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and multimedia technology, digital image data grows explosively in recent years. Facing the massive image data, how to carry out annotation, classification and retrieval accurately and efficiently has become a very important issue in intelligent information processing field. The key is image semantic representation and metric learning. The mainstream solution approach is to generate visual dictionary on the basis of the local features, set up middle semantic representation model, realize image semantic expression and then measure the distance between images. So, in this dissertation some visual dictionary generation ways are studied, such as K-Means clustering, hash mapping and learning coding etc. Then, the corresponding middle semantic models including bag of visual words, visual language model and learning coding model are researched, and some new image semantic representation methods are proposed. Finally, the metric learning methods are discussed. The main contributions are listed in the following six aspects:(1) The visual dictionary generation method based on K-Means clustering is studied. Unfortunately, some keypoints are just noise and unfit to represent visual content, and there are some visual stop visual words in the dictionary, which will reduce the visual content representation power. So, a visual dictionary generated method based on keypoints removal and Chi-Square model is proposed. Firstly, the similar keypoints are grouped together based on coordination, angle and the scale of a keypoint to reduce computational sonsumption, and improve the representativeness of keypoints; Then, the approximate K-Means is used to generate visual dictionary, and Chi-square model is introduced to analyze the correlation between visual words and image categories, which can eliminate the stop words. Experimental results indicate that this method can reduce the consumption and improve the distinguishability of visual dictionary effectively.(2) On the basis of the visual dictionary generated above, the corresponding middle semantic model is studied to decrease the quantization error and improve the expressive ability of the visual vocabulary distribution histograms, homoionym-based adaptive soft-assignment strategy is devised to generate the visual vocabulary distribution histograms and an image semantic representation method with homoionym-based adaptive soft-assignment and chi-square model is presented. Firstly, PLSA (probabilistic latent semantic analysis) is used to analyze the semantic co-occurrence probability of visual words, excavate the latent semantic topics in images, and get the latent topic distributions induced by the words; Secondly, the KL divergence is adopted for measuring semantic distance between visual words, which can get semantically related homoionym; Then, adaptive soft-assignment is proposed to realize the soft mapping between SIFT features and some homoionym; Finally, the chi-square model is introduced to eliminate the "visual stop-words", reconstruct the visual vocabulary histograms and realize the image semantic representation. The nolinear kernel SVM classifier and spatially-constrained similarity measurement are used to carry out object classification and object retrieval experiments. The results demonstrate that the new method can reduce the quantization error and strengthen image semantic representation ability, further improve the performance of object classification and retrieval.(3) The visual dictionary based on hash mapping is researched, and for the randomicity of hash function selection and instability of visual dictionary quality, a visual dictionary generation method based on weakly supervised E2LSH is proposed. Firstly, E2LSH is introduced to hash SIFT features according to the locality sensitive and high effiency of E2LSH and a group of dictionaries is generated; Then, the selecting process of hash functions is effectively supervised inspired by the random forest ideas and the priori information of training data to reduce the randomcity of E2LSH, The experiment results demonstrate that the method can reduce the randomcity of E2LSH, improve the distinguishability and stability of dictionaries and overcome the synonymy and ambiguity of visual words better.(4) On the basis of the visual dictionary generated by weakly supervised E2LSH, the corresponding middle semantic model is researched. The saliency map weighted language model is presented to weight and analyze the relevance of visual words in background and foreground, and an image semantic representation method based on weakly supervised E2LSH and saliency map weighted language model is proposed. Firstly, Graph-Based Visual Saliency (GBVS) algorithm is applied to detect the saliency map of different images; Secondly, the SIFT descriptors of images are mapped to the nearest neighbor visual words by the weakly supervised E2LSH and the visual words are weighted according to the saliency prior; Finally, the saliency map weighted visual language model is carried out to express image semantic content. The experiment results of object classification and retrieval demonstrate that this approach can boost image semantic representation and improve object classification and retrieval performance dramatically in complex environments.(5) The learning coding based visual dictionary and its corresponding middle semantic model is studied. For the traditional learning coding method-sparse coding is only a shallow learning model and the codeword lack selectivity for image features, an image semantic representation method based on deep learning coding model is proposed. Firstly, the unsupervised RBM is used to generate visual dictionary; Then, we steer the unsupervised RBM learning using a regularization scheme, which decomposes into a combined prior for the sparsity of each feature’s representation as well as the selectivity for each codeword; Finally, the dictionary is fine-tuned to be discriminative through the supervised learning from top-down labels, and the image deep learning representation feature is obtained. The experiment results of object classification demonstrate that the dictionary is more compact and inference is fast, the classification performance can be boosted effectively.(6) The metric learning method is explored. Considering the current mainstream approaches will lead to high computational complexity and make them difficult to apply large scale datasets well, a distance metric learning method based on feature grouping and eigenvalue optimization is proposed. Firstly, a feature grouping algorithm is introduced, which will segment image features into several groups according to the correlations between each dimension of characteristics. Then, the SDP problem could be covered to eigenvalue optimization issue under some certain constraints, so we only need calculate the maximum eigenvalues of matrix in every loopiteration. Experiments results indicate that the computational complexity and the learning time of metric matrix are reduced effectively, besides, the object classification and retrieval results are improved compared with the traditional methods.
Keywords/Search Tags:Image Semantic representation, Bag of Visual Words, Probabilistic Latent Semantic Analysis Model, Exact Euclidean Locality Sensitive Hashing, Silency Map Detection, Visual Language Model, Deep Learning, The Restricted Boltzmann Machine
PDF Full Text Request
Related items