Font Size: a A A

Design And Implementation Of Top-k Query System On Massive Data

Posted on:2020-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:C SongFull Text:PDF
GTID:2428330611999665Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The top-k query on massive data is a very important query.The top-k query returns the k objects with the highest score according to the specified ranking function.This paper studies two extensions of top-k query: top-k selection query and top-k skyline query.Firstly,the selection condition in top-k selection query is based on attribute value of the object itself,while the top-k skyline query is based on the relationship between the objects as the selection condition.Finally,returns the k objects that satisfying the selection condition and have the highest score to provide decision for the user.Firstly,This paper proposes a top-k selection query baseline algorithm BASel,which scans the data set sequentially and selects the k tuples that satisfy the selection conditions and have the highest score.In order to improve the speed of top-k selection query,this paper proposes a pre-sorted top-k selection query algorithm PTS.The PTS algorithm pre-sorts the data set,and sequentially scans the ordered tables to obtain the top-k selection query results.According to the characteristics of the data distribution,this paper proposes an early termination condition to reduce the number of I/O times.In order to further improve the efficiency of PTS algorithm,this paper propose two pruning methods: selective pruning and score pruning.Based on the pre-sorting above,the PTS algorithm combines two pruning strategies to further improve query speed.Experiments show that the PTS algorithm can effectively compute the query results of top-k selection on massive data.Secondly,This paper proposes a top-k skyline query baseline algorithm BASky.The query process of BASky algorithm is divided into two stages.In order to improve the query efficiency of BASky algorithm,this paper proposes an early-end top-k skyline query algorithm ETS,which preprocesses the data set.The execution process of ETS algorithm is divided into two stages.Because of the order of the data set,early termination conditions are proposed in the first stage and the second stage respectively to reduce I/O times.At the same time,the number of candidate tuples maintained in the first stage is reduced by combining the pruning strategy of ETS algorithm in the first stage,thus reducing the spatial complexity of ETS algorithm.The experimental results show that the first stage of the ETS algorithm can prune most of the tuples and can effectively compute top-k skyline query results on massive data.Finally,this paper takes the above algorithm as the core,and realizes the massive data top-k query system to provide decision support for users.A thorough test of the system shows that the system can satisfy the expected requirements.
Keywords/Search Tags:massive data, top-k selection query, top-k skyline query, early end
PDF Full Text Request
Related items