Font Size: a A A

Research And Implementation Of Hash Algorithm Based On Iterative Principal Component Analysis

Posted on:2019-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2428330566969768Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data and the increasing amount of data processing,data searches in large-scale data have become increasingly important.Nearest Neighbor Search is a routine operation in data retrieval and has received extensive attention.From the current point of view,the most classic nearest neighbor search algorithm is Location sensitive hash(LSH).Location-sensitive hashing algorithms have strict theoretical proofs,the algorithm is simple to implement and has a stable probability guarantee.However,LSH does not make full use of the distribution information of the data set.In the higher-dimensional data set,the query effect of LSH drops significantly.In order to make full use of the distribution information of the data set and improve the query effect,many scholars proposed a learning-based hash algorithm.The learning-based hash algorithm partitions the data set by training or extracts the main components of the data set,such as Data-Oriented Locality Sensitive Hashing(DSH)combines principal component analysis(PCA)and location-sensitive hashing to solve the nearest neighbor search problem.However,there is no strict theoretical guarantee for the success rate of the DSH algorithm.In a recent theoretical paper,an iterative PCA method was proposed.The iterative PCA method repeatedly performs PCA on the data set,which theoretically proves that the number of iterations is limited.However,the parameters of the iterative PCA method are difficult to determine,the definition of the distance from point to space is irrational,and the algorithm is complicated to implement.In order to fully utilize the distributed information of the data set,the algorithm is simple to implement,and the strict theoretical guarantee is achieved,this paper proposes an iterative PCA hash algorithm combining the advantages of the DSH algorithm and the iterative PCA method.The main research contents and contributions of this article are as follows:First,this paper presents an iterative PCA hash algorithm.Specifically,iterative PCA hash algorithm is proposed by solving iterative PCA method parameters difficult to set,point-to-space distance definition unreasonable two major problems,learning from the theoretical proof of iterative PCA method and the implementation of DSH algorithm.Theoretical analysis of thefeasibility of iterative PCA hash algorithm from point to space and the finite number of iterations.Secondly,the experimental iterative PCA hash algorithm query results are good.Specifically,it uses Python to implement LSH algorithm,DSH algorithm and iterative PCA hash algorithm.The analysis of the experimental results shows that(1)Iterative PCA hashing algorithm is more accurate than LSH algorithm and DSH algorithm on the premise of traversing the same candidate points.(2)Iterative PCA hashing algorithm does not have singularity(0,0),indicating that the dataset is more densely and uniformly distributed after being hashed by the iterative PCA hashing algorithm.(3)Iterative PCA Hash Algorithm experimental results converge faster to a higher precision position and reduce the time overhead for finding a suitable threshold w.
Keywords/Search Tags:Neighbor search, iterative PCA, location-sensitive hash, experimental verification
PDF Full Text Request
Related items