Font Size: a A A

Research On Parallel Skyline Algorithms And Their Applications In Cloud Computing Environment

Posted on:2017-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:1318330512469578Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the development of the Internet, the volume of data has grown exponentially. There is an increasing trend of applications to deal with massive data. However, it is difficult to process the massive data because of the limitations for the software and hardware. The system resources are seriously insufficient when processing the massive data. Now, cloud computing has been paid considerable attention by the academic and business community. As a dominant programming model in cloud computing, MapReduce has been widely used and popularized in the area of massive data. Skyline algorithm is a popular and powerful paradigm to extract interesting objects from the massive data. The algorithm can effectively prune out the useless objects to avoid interference when making multi-criteria decision. We conduct the research about the parallel skyline algorithms of the massive data in cloud computing environment. It focuses on the subspace skyline algorithm, the dynamic skyline algorithm and the metric skyline algorithm. Our work aims at optimizing the above skyline algorithms in the cloud computing and applying them to network monitoring and the image retrieval. Our major contributions are summarized as follows.For static skyline problem, we propose an efficient subspace skyline algorithm in MapReduce framework. Considering the high processing and communication cost caused by massive data, we propose a pruning strategy based on grid to prune out the non-skylines in advance. The methods of SQM-filtering and ?-filtering are proposed to filter out some skyline points according to user preference. They can reduce the results and be easy to make decisions for users, especially for mobile users. Finally, we implement the proposed algorithm in MapReduce framework and verify its effectiveness through a series of experiments in different data distributions. Experimental results show that the proposed methods improve the efficiency of subspace skyline query.For dynamic skyline problem, the distances between the query point and the other points in the dataset need to be recalculated when a new query point arrives. Meanwhile, it is difficult to do the dynamic skyline operator in cloud computing because of the distributed storage and parallel processing. Thus, the dynamic skyline algorithm costs too much and is not suitable for real-time processing in cloud computing. To address the above issues, a dynamic skyline algorithm using MapReduce is proposed. We propose the coarse global skyline cell to do dynamic skyline operator fast. In our method, the points in global skyline cells can be selected as the candidates for further calculation, which saves a lot of computing time. Finally, the dynamic skyline algorithm is applied to the detection of abnormal condition in network monitoring.For metric skyline problem over image dataset, most of the related researches have created their models on semantic vector space, which leads to the higher computational complexity in the image retrieval. We propose a metric skyline algorithm based on the fusion of multiple image features. Unlike other algorithms, our algorithm uses the low-level feature to describe image and constructs the similarity vectors by BOW (Bag of Words). Then, we take advantage of the skyline operator to retrieve image in the new metric space. Compared with traditional methods, our algorithm can incorporate the multi-dimensional feature similarity vectors into skyline operator, and not assign a different weight for each feature. In our method, the results not only include the images which are similar to the query image in multiple cues, but also in simple cue. Most of the traditional feature fusion methods usually require too many weights for the features, and the weight assignments lack of adaptability. Thus, our method is simple and general for the image retrieval. Finally, we verify the effectiveness and scalability of the proposed algorithm through a series of experiments.
Keywords/Search Tags:Cloud Computing, Massive Data, MapReduce, Skyline, Pruning
PDF Full Text Request
Related items