Font Size: a A A

Design And Implementation Of Distributed Similarity Data Storage Based On Spectral Hashing

Posted on:2017-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:S J HuangFull Text:PDF
GTID:2308330491451707Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the high-speed development of internet, the amount of data generated on a daily basis has exploded. Processing of high-dimensional and massive amounts of data for querying is a key challenge for the further aplications. However, high dimensional and massive data degrade the performance of the traditional distributed storage schemes, especially in terms of storage and switching overhead. Furthermore, the similarity search query is applied widely in the applications of cloud computing. How to address the above issues is an open problem.In this article, a distributed similarity data storage approach based on Spectral Hashing is proposed to meet the objective and requirement of similarity query.Firstly, similar high-dimensional data are projected to similar hash code using the Spectral Hashing method. Then, the hash code can further be mapped to the distributed hash table utlizing the hashing map based on Cauchy distribution. So, the similar data have the high probability of being stored in the close locations on the distributed hash tables. At the same time, in the distributed storage network, the nodes on the distributed hash tables can adjust the storage load adaptively to meet the requirement of load balance using the consistent hashing method. Above all, the switching overhead can be reduced significantly when querying similar data, which help query speed and similarity accuracy.Extensive simulation are conducted using the real world data, the results show that the proposed method can achieve high query efficiency by reducing the switching overhead.
Keywords/Search Tags:Spectral Hashing, Similarity data storage, Cauchy Distribution, Distrubuted Storage
PDF Full Text Request
Related items