Font Size: a A A

Analysis And Optimization Of Visual Coding Method For Large Scale Image Retrieval

Posted on:2016-03-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:1108330503455329Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Large scale image retrieval technology is one of the hot topics in the area of computer vision. As an important multimedia search technology, it can help users to find the related images quickly. Visual coding methods aim to build compact and effective image representations which impact directly on the time consumption, memory usage and retrieval performance. Recently, visual coding methods for large scale image retrieval are constantly emerging, such as bag of words(BOW), Fisher vector, VLAD and sparse coding(SC), which provide a powerful support for efficient and effective retrieval performance. There are also some shortcomings of current visual coding methods. As a huge codebook is needed, the offline training of BOW takes a lot of time, which is hard to update the image database. Traditional strong geometrical consistency for image re-ranking such as RANSAC and PROSAC suffers from low computing speed. Most of visual coding methods compute image similarity in descriptor space which ignores the context information of image. Product quantization(PQ) suffers a lot from the distribution of indexing vectors after PCA mapping are performed. The mathematical theory under the pooling operation of visual coding method is insufficient. Sufficient mathematical theory about pooling operation will also help to explore the optimization method of feature pooling.Aiming to improve the retrieval performance of current visual coding method, we combine distributed algorithm, context information, entropy coding and probability interpretation model to optimize it.. At last, we apply our research into mobile visual retrieval system. This thesis focuses on the research of visual coding method and tries to optimize it. The main aspects of this thesis are listed as following:(1) To solve the problem of training speed and memory usuage in BOW model, we propose a VT algorithm based on distributed clusting, which achieve effective and efficient training process of codebooks. Then, a fast geometrical re-ranking method is proposed to improve the retrieval performance.(2) Researches have been done on the fusion of context information and VLAD coding method, gravity-aware oriented coding and scale pooling are proposed. Oriented PQ which combines with Oriented Coding and Variable PQ which based on entropy coding are proposed for large scale image retrieval during the approximate nearest neighbor(ANN) search step.(3) A probability interpretation of feature pooling operation under spase coding framework is analyzed. We introduce the probability model of max pooling and sum pooling. With which, we propose an optimized sum pooling method which combines the advantage of max pooling and sum pooling.(4) We collect Beijing Landmark dataset which is GPS and gravity tagged. With above analysis and optimization method, we build a city-scale mobile visual search system.Experimental results show that our proposed method solve the problem of fast training in BOW. Context information is sufficient combined in coding step and feature pooling is well-designed for sparse coding. All of which contributes to improve the retrieval performance.
Keywords/Search Tags:large scale image retrieval, gravity information, geometrical information, product quantization, sparse coding
PDF Full Text Request
Related items