Font Size: a A A

Research And Improvement Of Skyline Query Algorithm In MapReduce Framework

Posted on:2017-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2308330485462192Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the age of big data, data volume and type increase sharply, how to find useful information for users in big data is crucial. As classical query algorithm, Skyline query algorithm effectively returns an optimal subset of the original dataset. Applying Skyline algorithm to MapReduce parallel computing framework has become effective method for Skyline query on big data environment. The time and space overhead of Skyline query on high dimension big data is high, so how to improve the efficiency of the Skyline query on big data has become a research hotspot. As the size of result of Skyline query grows exponentially as the dimiension increase, the result of Skyline query in big data is too large to provide accurate information for users, how to select smaller and more representative results worth further research.In order to improve the efficiency of Skyline query on big data environment, an optimal extended domination Skyline query algorithm based on MapReduce framework is proposed. The algorithm defines extended domination in the whole data space, verifies it’s high efficiency and reduces the original dataset by using extended domination. By optimizing the calculation of domination ability of data and using MapReduce composite key to sort dataset according to domination ability, the filtration of non-Skyline points is accelerated. Experiment results show that the algorithm can effectively speed up the filtration of non-Skyline points and improve the time performance of Skyline query.For the case that the Skyline result is too large in big data environment, in order to get more representative Skyline results, a dominant number based Skyline result optimization algorithm on MapReduce framework is proposed. The algorithm caculates the dominant number of Skyline points dynamically while data domination comparison and returns k Skyline points with highest dominant number. Experiment results show that the algorithm can effectively control the size of Skyline result and has good time and space performance.
Keywords/Search Tags:Big Data, Skyline Query, MapReduce, Exntended-domination, Dominant Number
PDF Full Text Request
Related items