Font Size: a A A

Processing Top-N Queries Based On P-Norm Distances

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:F F LiuFull Text:PDF
GTID:2268330422469995Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with information increasing dramatically and rhythm of life quickening in today’ssociety, it has been a problem that we have to face how to find available informationthrough masses of sources. Therefore, Top-N query is more and more widely used, and isaffecting our work, study, even all aspects of life. In recent years, there have been a lot ofpeople committed to propose accurate and efficient algorithm to solve the problem.However, these algorithms still can’t meet our requirements. A critical reason is that theconditions of most algorithms are relatively harsh. For example, the threshold algorithm(TA) demands that the ranking function is monotone, that the query point is fixed, and thatTA scans the sorted index lists unidirectionally, i.e., top down. But in many applications,these conditions can’t be met simultaneously, such as, arbitrary query point, non-monotoneranking function. At the same time, the information retrieval in the increasingly boomingInternet adopts IR retrieval way, that is, the keyword query. But the kind of IR system hasobvious deficiencies. One of deficiencies is that it can only provide limited structured dataquery capabilities. The other is that it is lack of query optimization mechanism.Aiming at those problems, this paper employs the fundamental principle of FunctionalAnalysis (i.e., norm equivalence theorem) and proposes an approach for evaluating Top-Nqueries in n-dimensional normed spaces. Our technique can alleviate the limitations ofTA-like methods and solve problems that the old methods can’t solve. Our work mainlyincludes the following aspects.Firstly, we analyze the query model in the n-dimensional real normed space where:(1)the ranking function is the distance induced from an arbitrarily given norm, which is notnecessarily F-monotone,(2) the query point is an arbitrary point (i.e., a real vector), and (3)the sorted index lists are scanned bidirectionally. TA-like methods are not suitable for thisquery model. We propose new algorithms to answer the Top-N queries of this model.Secondly, for x-monotonic norm distances (say, p-norm distances) in real n-dimensionalnormed spaces, we present three algorithms pTA, pTAz and pNRA for the three cases of data access methods (1) both sorted and random accesses,(2) restricting sorted access, and(3) no random access, respectively.Thirdly, for both different low-dimensional (2,3,4dimension) and high-dimensional (25,50,104dimension) data, we conduct extensive experiments that demonstrate theperformance of our proposed algorithms. The experimental results show that the algorithmsproposed in this paper can quickly find the correct Top-N results.
Keywords/Search Tags:Relational Database, Top-N Query, Ranking Function, Sorted Access, Random Access, Norm Space
PDF Full Text Request
Related items