Research Of The Efficient Index Structure For Approximate Nearest Neighbor Algorithm

Posted on:2016-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z N Song

Full Text:PDF

GTID:2308330470478502

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and network technology, the Internet has gradually come into our life, so it’s important that how can we find the picture we want in the database. Therefore, the technology of image retrieval is emerged. In the present image storage technology, the dimension of a picture is reached hundreds or thousands of dimensional; At this point our traditional search will be degraded to the original order traversal search. To overcome this problem, the approximate nearest neighbor methods(ANN) are proposed. Meanwhile, the efficiency of the image retrieval is accelerated by high dimensional data index. Locality Sensitive Hashing(LSH) and its variants are well-known and effective way to solve the problem of approximate nearest neighbor queries in high-dimensional space. The traditional locality sensitive hashing algorithm is used by several synthetic "static" LSH function in combination. And we have also established hash table that combination of the hash function.However, there is locality sensitive hashing method has the following drawback: When visited the set of candidates, we need a lot of random I/O operations. In order to ensure the quality of results returned, a large number of objects that need to be matched, this will generate a lot of I/O consumption. On the other hand, in order to reflect the status of the data points in the mapping space, we use the Hamming distance to measure the distance, but it may destroy the original feature space neighborhood structure, it violates the basic goal of locality sensitive hashing.In this article, in order to solve the above problems we propose a new method called ManhattanSort LSH (MS-LSH) to use Manhattan distance theory to value the distance that points corresponding to the compound hash keys, and then establish the index structure. Firstly, in order not to destroy the structure of the original feature space, we propose the Manhattan distance to make the data point distance state can be well protected which are mapped from the original space. Secondly, in order to reduce the large I/O overhead, we created compound hash key on the linear order relations and the corresponding data points can be sorted and the data points close to each other may be stored in a same file index. We can reduce the number of access times to unnecessary index files to reducing I/O overhead. The experimental results show that the proposed method can reduce the overhead of the I/O with the guarantee of the correctness of the results.

Keywords/Search Tags:

Approximate nearest neighbor seareh, Manhattan distanee, Locality sensitive hashing, Index Structure

PDF Full Text Request

Related items

1	Locality Sensitive Hashing Index Based On Neighborhood Collision Counting
2	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
3	Research On Local Sensitive Hashing And Approximate Nearest Neighbor Algorithm
4	Research Of Approximate Nearest Neighbor Search Based On Locality Sensitive Hashing
5	Reasearch On Locality Sensitive Hashing Based Approximate Nearest Neighbor(s) Searching Algorithm
6	Study On The Efficient Approximate Nearest Neighbor Search For Massive Data
7	Research Of Local Sensitive Hash Index Based On Nearest Neighbor Graph
8	Research Of Algorithm For Crowd Abnormal Behavior Detection Based On The Technology Of Approximate Nearest Neighbor Search
9	Efficient Algorithms For Approximate Aggregation And Nearest Neighbor Queries Over Multi-Dimensional Data
10	Approximate Nearest Neighbor Search For High-Dimensional Based On Nearest Neighbor Graph