Font Size: a A A

Performance Research On Sorting Algorithm In Cloud Computing Environment

Posted on:2015-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y C DingFull Text:PDF
GTID:2428330488499845Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid increase of the amount of data in cloud computing environment,it is an urgent need to study how to analysis and process these data fast and effectively.How to sort large scale data efficiently in cloud computing environment is a significant problem.Whether widely used sorting algorithms can achieve high-performance and how many cloud computing resources they consume are concerned problems.This paper focused on the fast,efficient,high cluster load balancing,and less resource consumption sorting algorithm for Hadoop platform,the main contributions of this paper are as follows:1)Analysis of several serial sorting algorithms with high efficiency.After mastering MapReduce programming framework and architecture of Hadoop,this paper realized Radixsort,Quicksort and Sample sort algorithm on Hadoop platform.Then comparative analysis thinking and algorithm complexity of the Radix sort,Quicksort and Sample sort in serial and parallel system.2)Analyzed and compared the efficiency,consumption of CPU resources,memory consumption and communication of Radix sort,Quicksort and Sample sort based on Hadoop platform.A large number of experiments demonstrated that compared to Radix sort and Quicksort,Sample sort has the advantages of higher sorting speed,higher load balancing and lower CPU consumption.This result provided a valid basis and foundation for designing more efficient,energy-saving algorithms in cloud computing environment.3)To solve the efficient sorting problem for uneven data sets,we research and proposed a high efficiency and high load balancing Randomized Partition Sample Sort algorithms.Sample sort algorithm is extensively used for data processing in cloud computing.It sorts large dataset efficiently by partitioning data into several buckets and conducting parallel sorting for all the buckets assuming that data is evenly partitioned.However,lots of datasets dealing with present are non-uniform distribution.The performance of standard sample sort algorithm deteriorates quickly with large variations of data partitions.To solve the sorting problem for non-uniform datasets,we propose the Randomized Partition Sample Sort(RPSS)algorithm.It introduces a randomized partition function,density unevenness can be made relatively uniform distribution on reduce.Through a large number of experiments on Hadoop,we found that RPSS sorts non-uniform datasets more efficiently compared to sample sort with improved load balance and a reduced failure rate.
Keywords/Search Tags:Cloud Computing, Hadoop, Sorting Algorithm, Mapreduce, Non-uniform Datasets
PDF Full Text Request
Related items