Font Size: a A A

Research On Skyline Query Based On MapReduce

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:W X CuiFull Text:PDF
GTID:2348330485952689Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Skyline query is a typical multi-objective optimization query and widely applied in multi-objective optimization,data mining and other fields.Most existing Skyline query processing algorithms assume that the data set was placed in a single server and query processing algorithm was designed as a serial algorithm for a single server.With the rapid growth of data,especially under the background of big data,the traditional serial Skyline algorithms based on a single computer have not met the needs of users.The main works of this paper is as follows.?1?This paper proposed the distributed parallel Skyline algorithm called BAPS?Balanced Angular parallel Skyline?for the big data sets based on the popular distributed programming framework?MapReduce?.BAPS improved the existing partition strategy based on angular by analyzing the influence to MapReduce,and proposed the partition strategy based on Balanced Angular.The proposed partition strategy used the mean of every dimension of all data as the partition criterion and divided the k dimension space into 2k-1 areas with the same amount of data in order to ensure the balanced cluster load.At the same time,to reduce the computation of Reduce phase,the data filtering policy using the local Skyline points was proposed at the Map end.The experimental results showed that the proposed distributed parallel Skyline algorithm?BAPS?can significantly improve the query performance.?2?This paper proposed a three-phase?preprocessing phase,computing phase and summary phase?solution based on MapReduce for processing the discrete probabilistic Skyline query?P-Skyline?on uncertain datasets.The preprocessing phase firstly sorts data according to their Hilbert values to ensure these data with a small distance nearby,and then build the optimized minimal pruning set based on the sorted datasets in order to maximize the pruning at the computing phase.At the computing phase,this paper proposed the hierarchical policy based on the optimal dimension value to improve the pruning ability.The hierarchical policy can quickly determine whether each data will be filtered according to its hierarchical level.The experimental results showed that the proposed solution has a good effect in the high-dimensional space.
Keywords/Search Tags:MapReduce, Skyline, Data partitioning, probabilistic Skyline, Hilbert curve, Hierarchical policy
PDF Full Text Request
Related items