Font Size: a A A

Research On Technologies For Image Annotation

Posted on:2016-06-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J LinFull Text:PDF
GTID:1318330536950228Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, with the explosion of image data, effective image retrieval methods are necessitated. Existing text-based and content-based image retrieval methods both have their weaknesses. Specifically, the former cannot handle images without any surrounding texts, while the latter can return visually similar but semantically irrelevant images due to the so-called “semantic gap” between low-level image features and high-level semantic concepts. To overcome those, image annotation is proposed, aiming to associate an image with several textual tags that can well describe its semantic content. By adding tags to images, text-based image retrieval is enabled, and the “semantic gap” in content-based image retrieval can be better bridged with the mid-level textual tags. Due to the promising future, image annotation has been attracting much attention from academia. Though studied for years with many effective methods already proposed, image annotation are still facing challenges, like the following ones. 1) How to reduce the training costs of model-based image annotation methods in cases with high-dimensional label spaces, i.e.large numbers of tags? 2) How to perform efficient nearest neighbour retrieval for datadriven image annotation methods from large-scale image data, and then well exploit the information in retrieved neighbours for annotation? 3) How to perform better annotation for images with a few initial tags, which can probably provide useful information for performance improvement but may also contain noises?In this thesis, we propose several methods for tackling the mentioned challenges for image annotation, and experimental results well demonstrate their effectiveness and reasonableness. Specifically, the main contributions of our work are listed as follows.1. For reducing the training costs of model-based image annotation methods in cases with high dimensional label spaces, we propose a novel method termed Fa IE to perform label space dimension reduction(LSDR) via Feature-aware Implicit label space Encoding. LSDR aims to reduce the training costs via encoding the original label space into a low-dimensional latent space and then reducing the number of needed predictive models. For LSDR, the proposed Fa IE performs implicit label space encoding by directly learning the latent space without any assumptions about the encoding process. Moreover, it jointly maximizes the recoverability of the original label space and the predictability of the to-be-learnt latent space. Experimental results show that Fa IE can substantially reduce the training costs while keeping acceptable performance.2. For performing efficient nearest neighbour retrieval in large-scale image data for data-driven image annotation methods, we propose a Semantics-Preserving Hashing methods, termed Se PH. The proposed Se PH is an approximate nearest neighbour retrieval method, which projects image features into binary hash codes and uses fast bit operations to calculate their Hamming distances for measuring image similarities. Specifically, Se PH firstly learns a semantics-preserving Hamming space via transforming the semantic affinities of training images and their similarities based on hash codes into two probability distributions and minimizing the Kullback-Leibler divergence. Then Se PH learns effective hash functions for projecting image features into the learnt Hamming space to generate their corresponding binary hash codes. Experimental results show that Se PH can retrieve semantically relevant neighbours at much lower time costs, and it can be a general framework for single-feature, multi-feature and cross-feature retrieval.3. To better exploit the information of nearest neighbours for performance improvement, we propose a new method termed Tag RS. Specifically, the proposed Tag RS observes that different to-be-annotated images can have different optimal quantity settings of nearest neighbours, and even different candidate tags for any to-beannotated image can have different selections of the neighbours to better predict their corresponding relevance. Therefore, Tag RS introduces a range constraint for the quantity setting of nearest neighbours, instead of using a fixed number. Moreover, for each candidate tag, Tag RS estimates its trust degrees in all neighbours via a tag-dependent random search process, and incorporates them with the weighted votes from nearest neighbours for the candidate tag to predict its relevance to the to-be-annotated image. Experimental results show that Tag RS is effective and its performance is less sensitive to the quantity setting of nearest neighbours.4. To better perform annotation for images with a few initial tags, we propose a tag completion method using Dual-view(i.e. image-view and tag-view) Linear Sparse Reconstructions, termed DLSR. For a to-be-annotated image, the proposed DLSR performs image-view linear sparse reconstructions for it with other images to exploit image-image correlations and obtain an image-view tag completion result from the tags of other images. Moreover, DLSR performs tag-view linear sparse reconstructions for each tag to exploit tag-tag correlations and obtain a tag-view tag completion result with the initial tags of the to-be-annotated image. Furthermore, both results are combined under a meta-search framework for better selecting the missing related tags. Experimental results show that DLSR is effective and reasonable.
Keywords/Search Tags:image annotation, label space dimension reduction, nearest neighbour retrieval and mining, tag completion
PDF Full Text Request
Related items