Font Size: a A A

Tensor Representation And Semantic Modeling For Image Annotation

Posted on:2016-02-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z M QianFull Text:PDF
GTID:1108330509461073Subject:Army commanding learn
Abstract/Summary:PDF Full Text Request
Image annotation is one of the fundamental problems in image understanding and analysis. With the growth of personal and web image sources, image annotation becomes more and more important in searching interesting images for users. Although it is much easy to annotate images by humans, image annotation is a much challenging task in computer vision. This is due to the fact that humans begin to train the complicated neural network in their brain from birth while machines might train just a few hours to understand images. Usually, there are two main topics of concern in current image annotation systems, relating to the representation problem needing to learn efficient features from various image contents and the semantic modeling problem requiring quality to be driven closer towards the level of human annotations. To cope with these two problems, this paper makes a survey on tensor representation and semantic modeling for image annotation. Moreover, according to the practical requirement of image annotation for web users, this paper also studies the personalization problem of annotation models and the tag refinement problem of social images.For image representation, this paper investigates global and local methods of tensor representation in order to achieve high-order statistical and structural characteristics of images. For global tensor representation, the paper presents two methods, with respect to regularized nonnegative tensor representation(RN TR) and Laplacian regularized uncorrelated tensor representation(LRUTR), by using nonnegative and uncorrelated regularizations, respectively. To investigate the impacts of sematic modeling by using different global features, this paper also proposed a graph regularized nonnegative group sparsity(GRN GS) model, and the experimental results show that tensor representation can provide much more discriminative features for semantic modeling. As to local tensor representation, a region descriptor is developed by using three-order statistical tensor which can better describe the contents of image blocks or segmented regions. Both the theoretical analysis and experimental results validate the efficiency of this descriptor.For semantic modeling, this paper investigates topic models, region labeling and hierarchical methods for image annotation according to global and local visual features. For topic modeling, this paper first presents an extended latent Dirichlet allocation(ELDA) model by using global visual features. Based on this, a new model, namely class-specific Gaussian-Multinomial latent Dirichlet allocation(cs GM-LDA), is also presented according to the locality of image semantics. This model combines the advantages of topic models and supervised learning, which can better deal with the discriminative power of modeling and the scalability of testing. Experimental results demonstrate the effectiveness of class-specific modeling for topic models. For semantic modeling with local regions, a local high-order support tensor machine(LHSTM) model is proposed by directly using high-order tensorial features of image regions as inputs and measuring the similarities with their compressed representation. Meanwhile, a spatial energy based model(SEBM) is proposed by combining contextual information for refining labeling results. By comparing the annotation methods using different image representations, we observe that the method using region labeling usually performs better, demonstrating the importance of region information for image annotation. In addition, by integrating global and local visual features, this paper proposes a multi- level method for image annotation. To reduce the mixture of different semantic levels, this hierarchical method only takes scene- level and object- level information, and uses a conditional random field(CRF) model for joint modeling of different semantic levels. Experimental results validate the superiority of using hierarchy for image annotation. Besides, the learned semantics of this hierarchical method can better il ustrate image contents with structural descriptions.For pe rsonalized image annotation, this paper proposes a class-specific model for image annotation in company with its personalization method by using annotations of the standard image database and that of the users’ image datasets. Considering that the size of a user’s annotation vocabulary is usually small and different users have different visual understanding towards a specific tag, the paper first exploits the standard image database with a class-specific weighted nearest neighbor(cs-WNN) model. This model combines the techniques of multiple kernel learning and nearest neighbor modeling, and is much effective for semantic modeling. Based on this model, a new personalization method, namely class-specific cross-domain learning(cs-CDL), is proposed to achieve users’ own annotation profiles by exploiting the users’ image datasets. Experimental results show the effectiveness of the personalization method, and demonstrate that personalization can be more beneficial for image retrieval.For tag refinement, this paper analyzes the diffe rent characteristics of data similarity and data co-occurrence, and proposes to harness the web image sources with a two-stage strategy according to diffe rent types of data relations. To solve the sparsity problem, a graph learning(GL) method is first introduced for enriching the tagging data according to item similarities. Then, a method of nonnegative tensor factorization(NTF) is developed for learning more coherent ternary relations among users, images and tags coupled by the manifold constraints learned from item co-occurrences. Experimental results show that the proposed method can better utilize the similarity and co-occurrence of web images, and thus improve the capability of the method for tag refinement.The work in this thesis helps us to understand the properties of image annotation by investigating from image representation and semantic modeling. Also, the comparison between the proposed methods and the relevant state-of-the-art methods are discussed. Experimental results show that the proposed methods are of great importance in both theory and practice. We hope that the contributions of this paper can serve as a guide and reference to related fields.
Keywords/Search Tags:image annotation, tensor representation, latent Dirichlet allocation, support tensor machine, hierarchy, cross domai n learning, graph learning, nonnegative tensor decomposition
PDF Full Text Request
Related items