Font Size: a A A

Research Of The Efficient Index Structure For Approximate Nearest Neighbor Algorithm

Posted on:2016-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z N SongFull Text:PDF
GTID:2308330470478502Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technology, the Internet has gradually come into our life, so it’s important that how can we find the picture we want in the database. Therefore, the technology of image retrieval is emerged. In the present image storage technology, the dimension of a picture is reached hundreds or thousands of dimensional; At this point our traditional search will be degraded to the original order traversal search. To overcome this problem, the approximate nearest neighbor methods(ANN) are proposed. Meanwhile, the efficiency of the image retrieval is accelerated by high dimensional data index. Locality Sensitive Hashing(LSH) and its variants are well-known and effective way to solve the problem of approximate nearest neighbor queries in high-dimensional space. The traditional locality sensitive hashing algorithm is used by several synthetic "static" LSH function in combination. And we have also established hash table that combination of the hash function.However, there is locality sensitive hashing method has the following drawback: When visited the set of candidates, we need a lot of random I/O operations. In order to ensure the quality of results returned, a large number of objects that need to be matched, this will generate a lot of I/O consumption. On the other hand, in order to reflect the status of the data points in the mapping space, we use the Hamming distance to measure the distance, but it may destroy the original feature space neighborhood structure, it violates the basic goal of locality sensitive hashing.In this article, in order to solve the above problems we propose a new method called ManhattanSort LSH (MS-LSH) to use Manhattan distance theory to value the distance that points corresponding to the compound hash keys, and then establish the index structure. Firstly, in order not to destroy the structure of the original feature space, we propose the Manhattan distance to make the data point distance state can be well protected which are mapped from the original space. Secondly, in order to reduce the large I/O overhead, we created compound hash key on the linear order relations and the corresponding data points can be sorted and the data points close to each other may be stored in a same file index. We can reduce the number of access times to unnecessary index files to reducing I/O overhead. The experimental results show that the proposed method can reduce the overhead of the I/O with the guarantee of the correctness of the results.
Keywords/Search Tags:Approximate nearest neighbor seareh, Manhattan distanee, Locality sensitive hashing, Index Structure
PDF Full Text Request
Related items