Font Size: a A A

Large-scale Visual Pattern Learning For High-performance Image Representation

Posted on:2015-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:D R LiFull Text:PDF
GTID:1268330428999922Subject:Information security
Abstract/Summary:PDF Full Text Request
With the popularity of digital devices and smart mobiles, and with the popularity of social networks and photo sharing by internet, the scale of web images becomes larger and larger and there are more and more requirements for the associated applications. Large-scale image data and its associated applications are a great challenge and also a good chance for the research topics in the image recognition area, such as object detection, image classification and image retrieval.In the past few years, object retrieval is the hot topic of image retrieval. The sparse image representation generated by a large vocabulary is a good way for the fast search in image retrieval. By our studies on learning visual pattern in local feature space and on image representation, we can generate high-performance image representation rapidly, so as to contribute to a better image retrieval system.To perform the recognition for large-scale images, visual attributes learning and mid-level image representation become hot research topics in recently years. We studied the learning of visual attributes and the generation of mid-level representation, to learn large-scale attributes rapidly and generate high-performance mid-level representation for recognition and retrieval.Our contributions and novelty are summarized as follows.(1) To handle the bottleneck of the available large-scale image retrieval system, we proposed an algorithm for the fast construction of high-performance visual vocabulary. Large-scale image retrieval system depends on large-scale vocabulary, to generate sparse representation indexed by inverted table for fast and exact search. Using the inheritance of visual patterns in the iterations of approximate algorithm, we proposed a robust approximate algorithm that guarantees convergence rapidly. The proposed algorithm requires nearly no more consumption of time and memory. Theoretical proofs guarantee that the algorithm converges to the converged solution of the exact algorithm. The experiment results show that the speed of our algorithm is about10times that of the available state-of-the-art algorithm for generating the equivalent vocabularies. By utilizing it, large-scale image retrieval system is easy to generate an even larger vocabulary with high performance, which is an effective technical support for the search speed and performance of the retrieval system. Besides, the proposed algorithm is also used in other tasks of visual pattern discovery, to construct a set of visual patterns rapidly.(2) In the large-scale image retrieval system, to handle the generation of image representation, we proposed a high-performance parameter-insensitive algorithm of quantizing the local feature and generating image representation. By the locality of the Gaussian kernel function, we proposed an algorithm to minimize the kernel reconstruction error. The proposed algorithm utilizes more neighbors in a better way to generate high-performance and sparse image representation; the learnt quantization weights get more information from the distance so that the image representation is more insensitive to the neighbor number parameter.(3) For the representation of general images, we proposed an indirect method, motivated by linear representation, to learn large-scale latent visual attributes rapidly and generate high-performance image representation. In the area of attribute-based mid-level representation, most available works concatenate the outputs of attribute models into a long vector as the representation. We proposed to indirectly learn visual attributes by learning one semantic subspace. The subspace learning algorithm can learn large-scale latent visual attributes rapidly into the semantic subspace. The semantic subspace is rich of semantic concepts so that the linear representation generated by linear projections is high-performance. Besides, the linear projects are semantic-aware and can be manually labeled with descriptions.(4) In the representation of general images, we proposed a nonlinear representation based on visual attributes for high-performance representation. All the works of representing in linear form have the shortcomings that they cannot utilize all the information of attribute models. The proposed representation scheme is motivated by the nonlinear representation in other problems. The scheme contains requirements for the3procedures, the attribute definition, the attribute model learning, and the representation generation:the attribute is defined as a quite biased binary classification; the learning model is advised to use supper vector machine; the representation is generated by nonlinear mapping with a proper scale value as the parameter. The experiments show that nonlinear representation can improve the representation significantly.By the former2works, we proposed a scheme to generate high-performance sparse representation, which guarantee that the large-scale image retrieval system can generate high-dimension sparse representation rapidly.The latter2works study the visual attribute and mid-level in the views of both the linear representation and nonlinear representation. The proposed method to fast learn liner representation and the proposed scheme to generate high-performance nonlinear representation are helpful for the future works on visual attributes and high-performance mid-level representation.
Keywords/Search Tags:large-scale image retrieval, visual vocabulary construction, fast androbust clustering, quantization, mid-level representation, visualattributes, subspace learning, nonlinear representation
PDF Full Text Request
Related items