Font Size: a A A

Research On Reverse Top-k Query And Distributed Application

Posted on:2022-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhaoFull Text:PDF
GTID:2518306572491234Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of information technology has brought about the explosive growth of data,and how to obtain Top-k results in massive data has become one of the hot issues in the field of computer research.After years of development,Top-k query technology has become quite mature.However,Top-k query is from the user's perspective of which the target is to retrieve the top k results that best match their search criteria.This article proposes a reverse Top-k query,which provides another perspective,that is,from the perspective of the merchant and decide how to judge the popularity of a product in the market and provide product manufacturers with greater benefits.Therefore,the research on the reverse Top-k query algorithm has important commercial value and research significance.This article mainly proposes an efficient reverse Top-k query algorithm and improves it in a distributed environment.This article gives the definition of reverse Top-k query and introduces two reverse Topk queries,namely,Monochromatic and Bichromatic reverse Top-k queries.For the Monochromatic reverse Top-k query,this paper explores the reverse Top-k algorithm in the two-dimensional space from two theorems,and expands it to the k-dimensional space.For the Bichromatic reverse Top-k query,this paper proposes the reverse Top-k threshold algorithm(TA),and illustrates the case and the optimization method.In order to improve the operating efficiency of the Bichromatic reverse Top-k query algorithm,this paper also proposes a reverse Top-k angle list method,which is an index structure based on space division.This article demonstrates the construction method of this grid and gives two different recursive termination conditions,namely,space restriction strategy and cost guarantee strategy,to adapt to different usage scenarios.At the same time,it analyzes if there is a data update and how should the algorithm be adjusted.When the amount of data increases to a certain extent,the computational cost of the centralized algorithm will be too high,resulting in too long processing time.Therefore,this paper proposes a reverse Top-k query algorithm in a distributed scenario,namely the DRT algorithm.The basic idea is Map Reduce,and the algorithm is specifically divided into three phases,the Map phase,the Shuffle phase,and the Reduce phase.At the same time,in order to ensure the correctness of the results,this paper adopts corresponding filtering methods for data points and weighting vectors,namely filtering based on the dominance relationship and filtering on the weighting vector.For the above algorithm,this article has conducted many experiments to evaluate the effectiveness and efficiency of the algorithm.The experiment uses real and synthetic data sets to conduct experiments to test the performance of the algorithm in different data distributions,data dimensions and different data sets.The experimental results show that the algorithm is efficient.
Keywords/Search Tags:Reverse Top-k, Top-k query, Threshold, Distributed, Parallel computing
PDF Full Text Request
Related items