Font Size: a A A

Research On Key Issues Of Image Classification And Annotation By Fusing Text Information

Posted on:2017-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L YangFull Text:PDF
GTID:1108330485460296Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and image technology, more and more infor-mation is expressed by images. For example, news on the website often contains images, the productions are often shown by pictures when online shopping. The image informa-tion increasingly becomes the main part of the network. It is necessary to carry out an efficient, reliable and intelligent system for image classification and annotation, which will help users to conveniently and quickly find the most useful content. It is a difficult task to classify or annotate images only by the visual features, researchers tried to trans-fer knowledge from other auxiliary domains to the image domain. Text information, such as the attribute information of the image, image annotations or documents around im-ages, and the related text description, is cheap and easily collected from web pages. And text mining technology is relatively mature, then a natural idea is to perform the image semantic understanding task with the aid of text information. The performance of im-age classification and annotation can be improved by adding some prior text knowledge, moreover, the text information can be automatically obtained without manual interven-tion, it can save the manpower and enhance the efficiency. This dissertation engages in academic research on image classification and annotation by combining the text informa-tion, it takes the image and text information fusion technology as the main line, and the goal is to improve the performance of the image classification and annotation. The main contributions are summarized as follows:(1) The image annotation performance is affected by the size of the annotated dataset. When the dataset is small, the image annotation performance is often unsatisfied. We propose an image annotation model based on a semi-supervised low-rank mapping from the image visual feature space to the label space. The semi-supervised learning is to use both few annotated data and large number of unannotated data. We enforce a manifold regularization to represent that two instances are close in the feature space, their new representation based on mapping should be close. In this case, the mapping is able to capture the intrinsic geometric structure among instances in both visual fea-ture space and annotation term space. The low-rank based regularization of the mapping can effectively exploit the label and feature correlations. It can handle missing annotation terms because it has ability to fill such missing entries with term correlations and intrinsic structure among data. Experiments on real-world multimedia datasets demonstrate that the proposed method can exploit the term correlations and obtain promising and better annotation results than state-of-the-art method.(2) In order to handle the high dimensional and noisy text-image data, we propose an image classification method based on the text-to-image transfer learning in noisy do-mains. The robust model can map the image and text data to a common latent space. Meanwhile, the model employs two error matrices to describe the sparsely distributed noisy text and image information. The common latent space is a reliable bridge to trans-fer accurate and useful knowledge from text domain to image domain. The model has ability to identify the high-level feature space for representing the target domain data, so that the existing classifier can be trained and used to predict the label information of new coming data. An efficient iterative algorithm is developed to solve the proposed model, and its convergence analysis is stated. Experimental results on real data sets are reported to demonstrate that the proposed model is effective for text-to-image transfer learning in noisy domains.(3) In order to simultaneously perform the image classification and annotation, we propose a discriminative and sparse topic model to generate latent topics such that rele-vant visual words and annotation terms can be identified and irrelevant words and terms can be ignored. The label information is enforced in the generation of visual words and annotation terms, which guarantees that each latent topic consists of visual words or an-notation terms which closely corresponding to a category, i.e., the learned topics are more discriminative. The zero-mean Laplace distribution is added to a topic generative process, which makes each topic contain a few visual words and annotation terms, and then an im-age can be represented sparsely by the latent topics, which means the learned topics are sparse. The sparse image representation in the identified topic space is helpful to learn a training model and then to improve classification and annotation performance.(4) In order to learn the relatedness between image and text domains, we propose a novel method to automatically learn the transferred weights by building a directed cyclic network from co-occurrence data. To build the network based on the heterogeneous data, the principal component analysis technique is firstly used to represent co-occurrence data. The Markov Chain Monte Carlo method is employed to construct a directed cyclic net-work where each node is a domain and each edge weight is the conditional dependence from one domain to another domain. When an edge weight is large (small), it means that a large (small) amount of knowledge can be transferred from a source domain to a target domain. The experimental results illustrate the effectiveness of the proposed method that can capture strong or weak relations among domains, and enhance the learning perfor-mance in the image domain.In summary, the main contribution of this dissertation is to improve the performance of the image classification and annotation by fusing text information.
Keywords/Search Tags:Image classification, Image annotation, Feature-level fusion, Transfer learning, Multi-view learning, Multi-label learning, Semi-supervised learning, Dimension reduction, Denoising, Relation between domains
PDF Full Text Request
Related items