Recently, information retrieval based on data similarity has become a hot research direction. Information retrieval based on data similarity is implemented in many field of computer sciences. Using data similarity to retrieve information can be divide into two classes. One is nearest neighbor search, the other is approximate nearest neighbor search. Locality sensitive hashing is an approximate nearest neighbor search algorithm based on probability theory and Euclidean geometry. Yet there is some drawbacks in locality sensitive hashing. For example, locality sensitive hashing has low filtering effect to long distance data point, besides locality sensitive hashing needs to build lots of indexing table to ensure the accuracy rate of the algorithm. To construct better algorithm to solve the similarity search problem, this article present two-level locality sensitive search based on intrinsic locality sensitive hashing. The main research contents are as following:Studying the intrinsic Locality Sensitive Hashing. When the retrieval point is far from data point, the filtering effect of intrinsic locality sensitive hashing is not good enough. To solve this problem, we used space sphere lattice to partition the data set to improve the filtering effect of the algorithm to long distance point. We scaled the projection segment to lower the probability of short distance point being projected to neighbor segment. Calculated the collision probability of Locality Sensitive Hashing function. Calculate the upper bound of indices ?. It turn out that Two-Level locality sensitive hashing function is better than the origin Locality Sensitive Hashing. Used MATLAB to do the contrast experiment on the GIST descriptor of CIFAR-10 dataset. Compared the recall rate and precision rate with the existing locality sensitive hashing algorithm.Improved the distribution indexing table in allusion to the two-level locality sensitive hashing. Two-level distributed indexing table using the characteristic of two-level locality sensitive hashing algorithm that two-level locality sensitive hashing took two step to hashing the data point, design the distr ibution indexing table into two-levels. Two-level distribution indexing table improved the accuracy rate of locality sensitive algorithm and reduced the rate of in memory occupation. This content used Sougou lab's data set of 2012 version to test the two-level locality sensitive hashing. This content compared two-level locality sensitive hashing with other similarity search algorithm in aspect of retrieval performance. Compared the two-level locality sensitive hashing algorithm used two-level distribution indexing table with algorithm used single level distribution indexing table in aspect of retrieval time, retrieval accuracy, scale performance. It finds out that the method used in this content have good precision rate and speed rate. |