Font Size: a A A

Research On Approximate Top-k Query Algorithm On Massive Data

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:N DongFull Text:PDF
GTID:2428330611497883Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the amount of data in the world has exploded.The top-k query method which can retrieve the target data quickly and effectively is a hot topic in the current computer research.In the massive data,using the traditional top-k query method to return accurate results often requires a long response time.Therefore,it is of great research significance to exchange approximate top-k queries with faster response speed at the expense of accuracy.This paper mainly studies the approximate top-k query algorithm on massive data from the aspects of certainty guarantee and probabilistic guarantee.In the research of approximate top-k query with certainty guarantee,the baseline algorithm BACG algorithm is proposed in this paper.The baseline algorithm adds an approximate factor to the classical TA algorithm,and widens the threshold limit,so that the returned approximate query results have certainty guarantee.In order to avoid excessive space-time overhead caused by random access,an approximate top-k query algorithm AQCG algorithm with certainty guarantee based on sorted list is proposed in this paper.The AQCG algorithm includes pre-processing,growing phase and shrinking phase.Preference scanning is added in the growing phase of the AQCG algorithm,so that unnecessary tuples on the attribute columns can be skipped during the scanning process,converge to the threshold as soon as possible,and enter the shrinking phase.In order to end the query process as soon as possible,the growing cut and shrinking cut are added to cut most of the tuples,which greatly reduces the number of I/O.Experimental results show that the AQCG algorithm can effectively calculate the certainty approximate top-k query results.In the research of approximate top-k query with probabilistic guarantee,a samplingbased approximate query method and the approximate top-k query algorithm TAPG algorithm with probabilistic guarantee are proposed in this paper.The approximate query method based on sampling adopts the weighted sampling method to pre-compute to generate synopses,and continuously updates the synopses according to the distribution of the data to answer the approximate query.The error is estimated by the query results to ensure that the approximate top-k query results can meet the requirements of users.A probability threshold test to the query is added on the TAPG algorithm,which can reduce the number of tuples scanned during the query process.By adding a priority queue to implement index pruning periodically,most tuples are cut off.The probability of these tuples appearing in the top-k query results is very small,greatly reducing the number of I/O.The results show that the TPAG algorithm has satisfactory performance in terms of probability guarantee and query cost.
Keywords/Search Tags:certainty guarantee, probability guarantee, approximate query, top-k query
PDF Full Text Request
Related items