Font Size: a A A

The Design And Analysis Of Sub-Linear Time Algorithm For Several Nearest-Neighbor Problems

Posted on:2022-09-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z MaFull Text:PDF
GTID:1488306569485714Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
It's been almost 10 years since the concept of Big Data came up.As time goes by and the society evolves,the applicational area of computer science is facing up to TB,PB,or even EB size of data.For data in such size,even linear-time algorithms consume unacceptable amount of time.Thus,the theoretical researchers brought up the concept of Sub-linear Time Algorithms,in order to solve the massive data computing problems from the complexity point of view,and solve the problem from the root.In this thesis,we choose several important Nearest-Neighbors problems,design sub-linear time algorithm for them,and give rigorous proof for the sub-linear time complexities of the proposed algorithms.The problems studied in this thesis include All k-Nearest-Neighbors,Turing reduction of Approximate Nearest Neighbors,and Approximate k-Nearest-Neighbors under two approximation criteria.For these problems,we have the following contributions.First,we study the All k-Nearest-Neighbors problem.In the related work of this problem,we found one paper claiming that their algorithm for this problem has O(n log n)time upper bound.But after careful analysis,we found this upper bound is unachievable for their algorithm.We prove that the algorithm in that paper actually has ?(n2)time lower bound.Then we propose a new algorithm which overcomes the ?(n2)part and truly achieves O(n log n)time upper bound.Second,we study the Turing reduction of the Approximate Nearest Neighbor prob-lem.Among the extensive research works about Approximate Nearest Neighbor problem,the method via Turing reduction is the most recent and the most capable of achieving sub-linear time.In this method,another problem,denoted as(c,r)-NN is defined,and the Approximate Nearest Neighbor problem is solved via Turing reduction to(c,r)-NN.It is known that the(c,r)-NN problem has pseudo sub-linear time algorithm.Then if we can design a sub-linear time Turing reduction,the Approximate Nearest Neighbor problem can be solved in sub-linear time.There are three algorithms for such reduction in the existing research works.They have various complexities,but the query time complexity,which is the number of times to invoke the algorithm for(c,r)-NN,are all higher than O(log n).In this thesis,we propose a new Turing reduction algorithm from Approximate Nearest Neighbor problem to(c,r)-NN.The new algorithm achieves O(log n)query time,which is superior to all the existing works.Third,we study the Turing reduction of the Approximate Nearest Neighbor problem under two special cases.We first study the case that the Assouad dimension can be applied to the input.We propose a modified algorithm for such case,which reduces the O((d2/?)d)factor in the preprocessing time and space complexity to O((d/?)d).We then study the case that the input follows the Poisson Point Process.In such case we propose another dedicated algorithm,whose preprocessing time and space complexity are both O(n log n)in expectation.Forth,we study the Approximate k-Nearest-Neighbors problem under two approxi-mation criteria.In the research works of the Approximate k-Nearest-Neighbors problem,there are two kinds of approximation criteria,which are called distance criterion and recall criterion,respectively.We analyzed the existing works,and found that the recall criterion is only used for measuring the experimental results of the algorithm but no theoretical guarantee.For the distance criterion,only a few algorithms are proved to have partial guar-antee.In such circumstance,we propose the Approximate k-Nearest-Neighbors problem under two approximation criteria,and design a new algorithm for this problem.We prove that the output of the proposed algorithm can satisfy at least one of the two criteria in any cases.Further,there is a certain probability that the output satisfies both of the criteria.
Keywords/Search Tags:All k-Nearest-Neighbors, Approximate Nearest Neighbor, Approximate k-Nearest-Neighbors, Sub-linear time algorithm, Split method
PDF Full Text Request
Related items