Research Of Massive Skyline Computing Based On MapReduce

Posted on:2015-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Wang

Full Text:PDF

GTID:2298330467984631

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Recently, with the rapid development of Internet applications and widespread application of cloud computing, data is showing the scale of explosive growth. How to find interesting data to help people make effective decisions is an urgent problem. Skyline computing is used to settle multi-objective decision-making problems. Skyline computing includes centralized processing and distributed processing from existing research content. The research on centralized processing has been mature, which includes algorithm of BNL(block nested loop), D&C (divide and conquer), SFS(sort-filter-skyline) and so on. With development of big data, distributed processing becomes important. The MapReduce model proposed by Google has high fault tolerance and good expansibility, which applies to data-intensive applications. So it is a good choice to perform Skyline computing on MapReduce.It needs to consider how to partition the dataset when Skyline computing is performed on MapReduce. There have been random partition, grid partition and angle-based partition. Random partition is simple but not stable. Grid partition only applies to low dimensional dataset. Angle-based partition first projects the coordinates onto the hypersphere and then partitions the dataset according to the hypersphere coordinates. This partition prunes more data when computing the local results. But the conversion of coordinates before partitioning is complex and time-consuming. This paper employs the hyperplane-projections-based partition method to partition the dataset. This partition first projects the coordinates onto the hyperplane and then partitions the dataset according to the hyperplane coordinates. This partition inherits the advantage of angle-based partition, that is it makes the local result sets small and at the same time it makes up the disadvantage of angle-based partition, that is the conversion of coordinates is simple and time-saving. This paper proposes MR-HPP(hyperplane-projections-based partition) based on hyperplane projections partition and optimizes the computing process from stage of merging-filter in MR-HPP and shuffle of MapReduce. In order to verify the effectiveness of MR-HPP, we conduct lots of comparative experiments about different partition strategies of Skyline computing on Hadoop. The experimental results show that our algorithm has expansibility, high efficiency and stability.

Keywords/Search Tags:

Skyline computing, Big data, MapReduce, Hyperplane-projections-basedpartition

PDF Full Text Request

Related items

1	Skyline Query Research For Massive RDF Data Under Distributed Computing Environments
2	Research On Skyline Query Based On MapReduce
3	Research On Parallel Skyline Algorithms And Their Applications In Cloud Computing Environment
4	Research On Computing Skyline Over Large Scale De-identification Policies
5	The Research Of Skyline Queries Algorithms Based On MapReduce
6	Research And Improvement Of Skyline Query Algorithm In MapReduce Framework
7	Research On K-dominant Skyline Algorithm Based On MapReduce And Incomplete Data Stream
8	Research Of Query Processing Method On Top-k Skyline In Mapreduce
9	Efficient K-dominant Skyline Query Based On Dominate Hierarchical Tree In MapReduce Environment
10	Top-k Skyline Query Algorithm Based On Data Partition In Distributed Environment