Research On Skyline Query Based On MapReduce

Posted on:2017-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:W X Cui

Full Text:PDF

GTID:2348330485952689

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Skyline query is a typical multi-objective optimization query and widely applied in multi-objective optimization,data mining and other fields.Most existing Skyline query processing algorithms assume that the data set was placed in a single server and query processing algorithm was designed as a serial algorithm for a single server.With the rapid growth of data,especially under the background of big data,the traditional serial Skyline algorithms based on a single computer have not met the needs of users.The main works of this paper is as follows.?1?This paper proposed the distributed parallel Skyline algorithm called BAPS?Balanced Angular parallel Skyline?for the big data sets based on the popular distributed programming framework?MapReduce?.BAPS improved the existing partition strategy based on angular by analyzing the influence to MapReduce,and proposed the partition strategy based on Balanced Angular.The proposed partition strategy used the mean of every dimension of all data as the partition criterion and divided the k dimension space into 2^k-1 areas with the same amount of data in order to ensure the balanced cluster load.At the same time,to reduce the computation of Reduce phase,the data filtering policy using the local Skyline points was proposed at the Map end.The experimental results showed that the proposed distributed parallel Skyline algorithm?BAPS?can significantly improve the query performance.?2?This paper proposed a three-phase?preprocessing phase,computing phase and summary phase?solution based on MapReduce for processing the discrete probabilistic Skyline query?P-Skyline?on uncertain datasets.The preprocessing phase firstly sorts data according to their Hilbert values to ensure these data with a small distance nearby,and then build the optimized minimal pruning set based on the sorted datasets in order to maximize the pruning at the computing phase.At the computing phase,this paper proposed the hierarchical policy based on the optimal dimension value to improve the pruning ability.The hierarchical policy can quickly determine whether each data will be filtered according to its hierarchical level.The experimental results showed that the proposed solution has a good effect in the high-dimensional space.

Keywords/Search Tags:

MapReduce, Skyline, Data partitioning, probabilistic Skyline, Hilbert curve, Hierarchical policy

PDF Full Text Request

Related items

1	Research On Skyline Query Processing Techniques
2	Research On Efficient Keyword Skyline Query Algorithm
3	Research On Computing Skyline Over Large Scale De-identification Policies
4	Research On K-dominant Skyline Algorithm Based On MapReduce And Incomplete Data Stream
5	Efficient K-dominant Skyline Query Based On Dominate Hierarchical Tree In MapReduce Environment
6	Research On Skyline Computation In Multiple Environments
7	Research On Distributed Probabilistic Skyline
8	Study On Skyline Query Processing In Distributed Environment
9	Skyline Query Research For Massive RDF Data Under Distributed Computing Environments
10	The Research Of Skyline Queries Algorithms Based On MapReduce