Font Size: a A A

Learning-based Distributed Locality Sensitive Hashing

Posted on:2017-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2348330512968174Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and network technology,the dimension of data is increasing every moment.Unfortunately,it is always time consuming and inefficient to find the data we want in the database.Nearest Neighbor(NN)search in high dimensional space has become a fundamental paradigm in many applications like machine learning,data learning and pattern recognition.Locality sensitive hashing(LSH)has proved to be one of the most effective method for scalable high dimensional nearest neighbor search.(key,value)based distributed frameworks,such as MapReduce,Twitter Storm and spark are gaining increasingly widespread use in applications that process large amounts of data.Combining LSH in the background of(key,value)based distributed framework is recently a hot topic.However,the LSH scheme needs a rather large number of hash tables to ensure its efficiency.This entails a large space requirement,and in the distributed setting,with each query requiring a network call per hash bucket look up,this also entails a big network load.In the meaning while,to decrease the space and for the purpose of clarity,some methods use a single hash table and choose randomly hash functions,then it is hard to achieve high efficiency.This paper attempt in applying learning based algorithm in the background of(key,value)based distributed frameworks and using it to process NN queries with MapReduce.We summarize the contributions of the paper as follows:(1)We propose a novel access method,called LB-LSH,to implement Entropy LSH in the distributed(key.value)model,and provide its pseudo-code for distributed frameworks MapReduce.(2)LB-LSH can ensure the high efficiency in the case of only O(1)table and decrease the I/O consumption.(3)We present experimental results with LB LSH on Hadoop,and the proposed method can achieve better performance compared to the state-of-the-art hashing approaches.
Keywords/Search Tags:High Dimensions, Similarity Search, LSH, MapReduce
PDF Full Text Request
Related items