Font Size: a A A

Research On Key Technologies Of Image Automatic Annotation

Posted on:2018-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:M ZangFull Text:PDF
GTID:1318330518996821Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of multimedia technology and the increasing portability of digital image acquisition devices, digital image resources grow explosively with an urgent need for effective image management and retrieval.In the existing text-based image retrieval system, the manual image annotation is time-consuming and laborious, and can not cover the massive images. Automatic image annotation allows the computer to automatically assign the key words to the image reflecting its semantic content, which greatly improves the efficiency and accuracy of image retrieval. It has broad application prospects in the field of image and video retrieval, scene understanding, human-computer interaction and so on. However, due to the semantic gap between the image visual features and semantic labels, image annotation is still a challenging task and has become a hot research topic in the field of computer vision.The key technologies of automatic image annotation are researched and discussed in this dissertation. Based on the application of machine learning theories and methods in the field of computer vision, four important issues are elaborated and researched in depth, which are the effective integration of various visual features, similarity measurement between images, the effective utilization of similarity and diversity of different views for an image as well as the harnessment of deep learning feature and labels imbalance in image datasets. The main contributions are summarized as follows:1) To solve the problem of multiple feature concatenation may lead to"dimensional curse" and reduce the discrimination power of features, we present two feature selection algorithms based on Distance Constraint Sparse /Group Sparse Coding (DCSC/DCGSC) for image annotation. Considering that feature atoms similarity may have different contribution to the semantic similarity between images, a distance constraint regularization is defined and integrated with sparse / group sparse coding for feature selection , which encourages the feature atoms with sparsity / group sparsity as well as the semantic discrimination. Given a test image, the K-nearest neighbors can be found using the learned feature weights from the training images and labels can be transferred. Experimental results on Corel5K and IAPRTC12 datasets validate the effectivity of distance constraint for feature selection and image annotation.2) To solve the problem of effective integration of multiple features and semantic information as well as that of the traditional one-to-one similarity measurement is not suitable for multi-label classification, a Multi-view Mix-norm Sparse Coding (MvMnSC) model for image annotation is presented. In which, each type of features and semantic labels are considered as a view respectively to learn a sharing spare representation between the views. Furthermore, dictionary learning is integrated into the model to reduce the noise of training samples and calculation complexity. In order to exploit flexibility of L1-norm sparsity along with the structure prior of L1,2-norm sparsity, a mixed norm regularization is introduced to keep a balance between them, which will favor to learn the optimal dictionary and sparse representation adaptively. Alternating direction multiplier algorithm and K-singular value decomposition algorithm are introduced to solve the objective optimization problem. The label transfer scheme is simple and the experimental results on Core15K and IAPRTC12 datasets demonstrate the effectiveness of proposed method compared with the related approaches for image annotation task, and the mix-norm constraint can improve the annotation performance.3) In order to effectively utilize the similarity and diversity of multiple views from an image, we present a Multi-view Joint Sparse Coding (MvJSC)framework for image annotation, in which each view is allocated with a distinct sparse coefficient representation to allows the flexibility of multiple views on the one hand, on the other hand, a joint sparse regularization term is introduced to enforce the similar sparse pattern across multiple views. To boost the discrimination power further, we expand this model to kernel space to capture the non-linear similarity of the data, and accordingly the accelerating proximal gradient algorithm and kernel K-singular value decomposition algorithm are expanded to solve the optimization problem of sparse coding and dictionary updating under the case of multi-view kernel space. The corresponding label prediction algorithms are also proposed.Comparative experiments on three general datasets have illustrated the importance of exploit the similarity and diversity across multiple views concurrently and the validation of joint sparsity constraint for this issue, and kernel mapping can improve the annotation performance further.4) To solve the problem of imbalance problem of labels distribution in image dataset and that of handcrafted feature behaves unsatisfied, we present a Multi-view learning model with Mixed Locality and Structure Sparsity constraint (MvMLS2). In which, a multi-view locality constraint regularization is defined and is integrated along with structured sparsity constraint into a multi-view image annotation framework. Furthermore, the structured sparsity constraint is also applied to dictionary to adaptively exploit the similarity and diversity between multiple views. This model simplify the optimization process and helps to introduce more complicated features. We extract the deep learning features based on the pretrained convolutional neural network, and feed them into multi-view learning along with the handcrafted features to employ more complementary information for discrimination.Besides, two weight matrix are defined based on the label distribution frequency to increase the weight of rare labels and that of images annotated with rare labels. Experiments on Core15K and IAPRTC12 datasets have evaluated an improved performance of the proposed method with deep leaning features and handcrafted ones together,and that the way of increasing the weights of rare labels can raise the system recall.
Keywords/Search Tags:image automatic annotation, sparse coding, feature selecting, multi-view learning, dictionary learning
PDF Full Text Request
Related items