Font Size: a A A

Research On K-dominant Skyline Algorithm Based On MapReduce And Incomplete Data Stream

Posted on:2016-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:P P LiuFull Text:PDF
GTID:2308330479450950Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the internet applications developing, data continuous increasing and varing data types, existing skyline query algorithms can give many results, so that they are unable to provide better decision for users. Therefore the k-dominant skyline algorithms are prosed. They search fewer results by wakening of the relationship between the points. Because of the data set and dimension existing parallel k- dominated skyline algorithms are inefficiently in time and space. And as the application changes, the increasing demand for real-time data, the k-dominant skyline algorithms have not perfected to deal with real-time question. Time efficiency needs to be improved. In this paper, according to in-depth analysis of the status of the k- dominant skyline algorithms, two new solutions were proposed against the background. The introduce as below followings:Firstly in recent years, due to the widespread use of cloud computing platform, this parallel processing Map Reduce framework has developed on the platform, effectively improved the efficiency of distributed execution. The framework puts programs to PCs which make up thousands of clusters, thus effectively overcome the shortcomings and deficiencies of the existing parallel k-dominant algorithms. So this paper we combine the relatively mature Map Reduce which is a parallel processing framewok with the k-dominant skyline algorithms to solve this problem.Prosing two algorithms(Map Reduce based one scan algorithm, Map Reduce based Two Scan algorithm) and presorting the input datasets.Secondly Under the incomplete data stream environment,the S-skyline algorithm is proposed. It marks the data stream with bit, then calculates the mean on every dimension and sorts it which reduces the number of comparisons and improves the time efficiency of the algorithm through comparison between the stronger dominate point and the weaker point.In view of the above analysis, we design the corresponding experimental verification from the amount of data, data distribution, different data dimensions, and the size of the cluster and so on according to the MR_OSA and MR_TSA algorithms. The S-Skyline algorithm is tested in a different window sizes, different dimensions and varying k value, etc, to verify the algorithm feasibile.
Keywords/Search Tags:k-dominant skyline, Map Reduce, S-skyline
PDF Full Text Request
Related items