Font Size: a A A

The Research Of Skyline Queries Algorithms Based On MapReduce

Posted on:2015-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2428330488999856Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Database and storage technology,the number of data which people can collect and use is bigger and bigger.It's of great significance to find useful information quickly and accurately.Skyline queries aim to find the set of data from the multi-dimensional data set which is not dominated by others.Skyline queries are widely used in the field of data mining,multi-objective decision.However,with the rapid growth of data size,single-node architecture can not meet the requirements of computing.As a parallel programming framework,MapReduce uses a cluster composed by common computers to process large-scale tasks,and it encapsulates the scheduling,error handling,communications and other complicated details between computers in the cluster.MapReduce specializes in large-scale data processing,and has good scalability.Although there are some preliminary progress has been made on MapReduce,the existing algorithms can not meet the requirements of Skyline computation.This paper aims to improve the performance of Skyline computation based on MapReduce framework.The main work and innovations are as follows:Firstly,this thesis analyzed the existing algorithms based on MapReduce and found that these algorithms were not effective pretreatment.Therefore,this thesis proposes an efficient pretreatment Skyline query algorithm named MRFS(MapReduce based Filter Skyline)to preprocess the data,MRFS extracts a small point set which with high ability to dominate from the origin dataset,and uses the set to filter the dataset to eliminate part of data objects which can not become the Skyline set.This algorithm is divided into two phases,it computes patical skyline sets at map phase in parallel,then merges them to a reduce job to get the final skyline set.Experiments results show that the proposed algorithm improves 20%to 30%in terms of time efficiency than the existing algorithms.Secondly,there has not been anyone making the relevant work based on MapReduce framework for k-dominant skyline queries about high-dimensional data space.This paper introduces the MapReduce framework to k-dominant Skyline queries,and proposes three algorithms according to different scenarios,they are two scanning algorithm based on MapReduce(MR-TSA),the indexing algorithm based on MapReduce(MR-IBA),and an improved algorithm named MR-SIA based on simple sorting.The results show that the proposed algorithms are high efficient and available.
Keywords/Search Tags:Database, Skyline queries, Data mining, Multi-objective decision, MapReduce
PDF Full Text Request
Related items