Research On Distributed Data Query Based On Hadoop

Posted on:2019-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:Q Yang

Full Text:PDF

GTID:2428330566974051

Subject:Full-time Engineering

Abstract/Summary:

PDF Full Text Request

With the vigorous development of Internet information,data generated by Internet is growing rapidly.How to find out user's data in massive data has become the focus of research.Skyline queries are often used in many fields,such as multi-objective decision analysis and data visualization,and can effectively find some of the better subset on the data set.With the increase of the amount of data,the Skyline query algorithm is run on the Hadoop framework,which can effectively handle the Skyline query in the large data environment.Because the size of Skyline result set exponentially increases with data dimension.When the result set is too large,it can not return precise information for users.How to select smaller and more representative query results is worth further research.In order to solve the problem of low efficiency of existing distributed Skyline query algorithms,this paper optimizes the Skyline query algorithm based on the MapReduce running framework.The idea of this algorithm is to preprocess the original data set,select strong points,filter the original data set,and filter most of the non Skyline data points before the algorithm starts.At the same time,combined with the processing strategy of the hybrid Skyline query algorithm,set up a time interval,update the local Skyline query algorithm in the time interval,and reduce the duplicate comparison between data points.The experimental results show that the algorithm can filter out non Skyline points in advance and improve the time performance of the algorithm.Aiming at the huge problem of Skyline result set in big data environment,in order to optimize the Skyline result set and get more representative Skyline results,a Skyline result set optimization algorithm based on dominating number in MapReduce framework is proposed.The algorithm puts forward the calculation method of data point dominating number,that is,when data points are compared and controlled,the number of data points is dynamically calculated,so that users can return K Skyline points with the highest number to represent the Skyline result set.The experimental results show that the algorithm can effectively control the size of the Skyline result set,and has good time and space performance.

Keywords/Search Tags:

Big data, Skyline query, Hadoop, MapReduce, filter, Domination number

PDF Full Text Request

Related items

1	Research And Improvement Of Skyline Query Algorithm In MapReduce Framework
2	Research On Skyline Query Based On MapReduce
3	Efficient K-dominant Skyline Query Based On Dominate Hierarchical Tree In MapReduce Environment
4	Research On Domination In Graphs
5	Research And Implementation Of Skyline Query Algorithm In LBSN Environment
6	Research Of Dynamic Skyline Query Processing Approach In MapReduce
7	Research On Reverse Skyline Query Algorithm Based On SR Tree Under MapReduce Model
8	Research Of Query Processing Method On Top-k Skyline In Mapreduce
9	Skyline Queries For Moving Objects Based On MapReduce
10	Research On Skyline Query Processing Techniques