Font Size: a A A

Towards Performance Analysis Of Locality Sensitive Hashing

Posted on:2015-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:J CaoFull Text:PDF
GTID:2268330425981971Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, various fields such as images and multimedia databases, geographic information systems, data mining and analysis systems, massive data applications need for nearest neighbor search in large data sets. In another words, to find out one or a group of most similar objects with a given query object. It has an effective answer for the nearest neighbor search in low-dimensional space. While in high-dimensional data space, it is very difficult th find a pecfect solution due to the presence of "curse of dimensionality". The most effective technique to this problem is the establishment of high-dimensional index. Existing high-dimensional indexing structure can be divided into tree-based index and hash-based index. The tree-based indexing technology established tree-assisted query structure by dividing the space. It is better at lower data dimension, but when the dimension over dozens of dimensions, it easy to index space overlap and will resulting in inefficient indexes with large space. On hash-based indexing technoques, it can find similar objects in linear time through the hash function. Thus it gets widespread attention, research and applications.On the basis of extensive reading references, this paper conducted system study of location-sensitive hashing algorithm. In the study, we found tha the locality sensitive hashing algorithm theory and its practival implementation is inherently different.And that difference would leads to profiling errors.This paper studies the performance analysis of locality sensitive hashing. This paper mainly consisted of three parts:(1) This paper describes the essential difference between theoretical implementation of LSH and practical application. Using real-world data sets to simulate the algotithm both theoretical performance and actual performance analysis, then verified the difference between the two in experiment.(2)The traditional performance analysis of location-sensitive hashing algorithm is based on the premise doesn’t exist in actual application, which results in the theoretical and actual performance is not met. In the experiment, the recall rate of the actural algorithms will fluctuate up and down close to the theoretical value rather equal. To this end, we proposed a new performance analysis model; the model can accurately predict the actual performance of algorithm.(3)In order to verify the validity of the new model, we achieve the two algirithms base on E2LSH code---LSHc and LSHn. We use high-dimensional data such as Mnist, Color, and Audio. Then compare the recall rates ad collision rate of LSHc and LSHn. Experimental results show that the result generated by the traditional method exists a certain gap with actual results. While the new model proposed in this paper is an accurate predictive algorithm in a real application performance.
Keywords/Search Tags:Locality Sensitive Hashing, neighborsearch, analysis model, high-dimensionaldata
PDF Full Text Request
Related items