Font Size: a A A

Research On Similarity Image Retrieval Based On Locality Sensitive Hashing And Structured P2P Network

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:L ShenFull Text:PDF
GTID:2428330590995503Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Similarity image search problem refers to the problem of returning data points close to the query points in a given data set.Nearest Neighbor(NN)search methods show good query effect when the dimension of data points is low.Due to the Curse of Dimensionality problem,NN search method becomes extremely difficult for high-dimensional data points.In order to improve the efficiency of data search,Approximate Nearest Neighbor(ANN)search methods are proposed.Locality Sensitive Hashing(LSH)and its variants are well-known schemes to solve approximate neighbor search problems in high-dimensional space.Traditionally,these indexing schemes are centrally managed,requiring multiple hash tables to ensure search quality.However,due to the limitation of storage space,those schemes become unrealistic for massive data objects in the centralized indexing schemes.Therefore,the distributed indexing schemes based on P2 P network are proposed,and how to ensure the load balance of nodes in distributed network has become one of research hotspots.Facing the above characteristics and some problems of existing schemes,the load balancing of similarity image indexing is studied in this thesis.Firstly,the data distribution model of distributed indexing based on LSH is studied.Considering the difference of norms of data feature in data sets,the theoretical models of data distribution based on homogeneous norms and heterogeneous norms are constructed respectively,and those data distribution models are proved that they can accurately predict the distribution of hash values of distributed indexing data.Unlike existing models based on multiple hash tables,the proposed model is suitable for single hash table,which can significantly reduce the space occupied by creating distributed indexes.Then,based on LSH distributed indexing data distribution model,a static load balancing of distributed similarity image indexing scheme is proposed.Considering the characteristics of cumulative distribution function,a new index mapping method is proposed in the indexing scheme,which can guarantee the load balancing of each node in the data mapping phase probabilistically.The indexing scheme proposed in this thesis benefits from the nature of normal distribution,which can make the Chord-style P2 P network obtain more balanced load.Furthermore,a dynamic load balancing algorithm based on virtual nodes is proposed to solve the problem of load imbalance caused by node dynamics and index mapping errors.This algorithm can realize dynamic load adjustment between overloaded nodes and light-loaded nodes in P2 P network,which can keep the load balancing of each node.This algorithm enlarges the scope of application of LSH-based distributed indexing data distribution model,and makes the static distributed similarity indexing scheme more practical.Finally,the efficiency and effectiveness of distributed similarity image indexing scheme and dynamic load balancing algorithm are evaluated experimentally using synthetic data sets and two real data sets.The research results of this thesis have important theoretical significance and broad application value in the field of massive high-dimensional image data search and distributed P2 P network technology.
Keywords/Search Tags:Locality Sensitive Hashing, Similarity Search, P2P network, Load Balancing, High dimensional space
PDF Full Text Request
Related items