Font Size: a A A

Research Of Massive Skyline Computing Based On MapReduce

Posted on:2015-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:S Y WangFull Text:PDF
GTID:2298330467984631Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the rapid development of Internet applications and widespread application of cloud computing, data is showing the scale of explosive growth. How to find interesting data to help people make effective decisions is an urgent problem. Skyline computing is used to settle multi-objective decision-making problems. Skyline computing includes centralized processing and distributed processing from existing research content. The research on centralized processing has been mature, which includes algorithm of BNL(block nested loop), D&C (divide and conquer), SFS(sort-filter-skyline) and so on. With development of big data, distributed processing becomes important. The MapReduce model proposed by Google has high fault tolerance and good expansibility, which applies to data-intensive applications. So it is a good choice to perform Skyline computing on MapReduce.It needs to consider how to partition the dataset when Skyline computing is performed on MapReduce. There have been random partition, grid partition and angle-based partition. Random partition is simple but not stable. Grid partition only applies to low dimensional dataset. Angle-based partition first projects the coordinates onto the hypersphere and then partitions the dataset according to the hypersphere coordinates. This partition prunes more data when computing the local results. But the conversion of coordinates before partitioning is complex and time-consuming. This paper employs the hyperplane-projections-based partition method to partition the dataset. This partition first projects the coordinates onto the hyperplane and then partitions the dataset according to the hyperplane coordinates. This partition inherits the advantage of angle-based partition, that is it makes the local result sets small and at the same time it makes up the disadvantage of angle-based partition, that is the conversion of coordinates is simple and time-saving. This paper proposes MR-HPP(hyperplane-projections-based partition) based on hyperplane projections partition and optimizes the computing process from stage of merging-filter in MR-HPP and shuffle of MapReduce. In order to verify the effectiveness of MR-HPP, we conduct lots of comparative experiments about different partition strategies of Skyline computing on Hadoop. The experimental results show that our algorithm has expansibility, high efficiency and stability.
Keywords/Search Tags:Skyline computing, Big data, MapReduce, Hyperplane-projections-basedpartition
PDF Full Text Request
Related items