Font Size: a A A

Research On Hash Index Towards Product Image Search

Posted on:2014-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2248330398975213Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the continuous development of e-commerce, online shopping is chosen by more and more people because of its fast and convenient style, while consequently the number of network images is growing geometrically. Image searching problem upon similarity, how to quickly and efficiently return the users’demanded product in the vast shopping image dataset, has become one of the latest research focus in academia and industry. At present, several image searching systems have been existed such as taotaosou and antoso. When the user uploads an image, the time of returning results is relatively long, which difficultly satisfy the needs of user. In order to speed up the search, the construction of the index structure for the dataset in the system after the preprocessing is required. A good index structure can greatly improve the system retrieval speed.For this feature, a series of methods such as the tree index structure is presented to solve the image similarity matching. Fortunately these methods are typically applicable only in the case of lower dimensional feature vector while for high-dimensional situation their performances falls deteriorate. To solve the problem, an approximate nearest neighbor based searching algorithm, LSH, has been proposed. This method is suitable for high dimensional data with fast and efficient search performance, and is the most popular method for similarity search.This paper presents two fast and efficient locality sensitive hashing algorithms. These two algorithms can reduce query time in the case of certain search accuracy. The main work and contributions of this article can be stated as follows:First, a new random locality sensitive hashing algorithm is presented. The algorithm still follows the LSH thoughts and framework, and improves widespread defects of existing LSH algorithms. In building index structure, for each hash bucket, its internal feature vectors are stored in ascending order in accordance with bucket center distance of their respective hash bucket. In the query, the triangle inequality principle is used to prune vectors in the candidate buckets, so it reduces the times of distance calculations and improves query speed.Second, a new non-random locality sensitive hashing algorithm is presented. The algorithm creates a hash function on the original dataset according to the amount of information analysis dimension to improve the instability of random locality sensitive hashing. The information of dimension of a fixed dataset is always same; therefore the selection of the hash function is non-random so as to ensure the stability of the performance of the algorithm. In the query, multi-probe detection ideology is used, which makes neighboring buckets as candidate barrels in addition to respective hash bucket of the query point. In the meanwhile, triangle inequality pruning filter is used to speed up the search.Third, the two algorithms presented in this paper are verified by experiments. Every image is described as the global color feature and local SIFT feature, within which SIFT uses the BoW model to build vocabulary and makes each image correspond to only one vector. By comparing sequential scan which has the highest accuracy and two classic LSH algorithms, both algorithms presented in this paper have achieved better performances in time, space, recall, dimensions adaptability and stability.
Keywords/Search Tags:Index structure, Locality sensitive hashing, Triangle inequality, Information ofdimensions, Multi-probe detection
PDF Full Text Request
Related items