The Research And Implementation Of Diversity Demand Oriented Parallel Computing Model

Posted on:2015-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:P Z Liu

Full Text:PDF

GTID:2428330491460283

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Along with the advance of the information technology,there has been an explosive growth of the data that can be acquired and stored by human in the last decade.It is becoming a daily requirement of many companies and institutions to deal with massive data.This paper illustrated the MapReduce,which is most popular massive data processing model at present,and analyzed the shortages of the model in depth,especially in the dataflow and static partition mechanism of the intermediate data.The single fixed dataflow made the model unsuitable for iterative computations,and would produce unnecessary Map phase and DFS IO.The static partition was easy to cause data skew and unsuitable Reduce instances.To deal with these problems,this paper proposed the Comprehensive MapReduce(CMR).CMR was a generalized model of MapReduce.All the work that MapReduce can do could be dealt with by CMR.However,CMR has significant improvement of MapReduce in the aspect of dealing with iterative tasks and dynamic flow control.In CMR,iterative tasks can be executed in one job.Users can use dataflows to connect Functions and can define the condition when the iterative task can stop.Function can deal with multi-inputs,and produce multi-outputs.The CMR had significant advantages in efficiency and usability than Hadoop.In the paper,some experiments had also been introduced for comparing the efficiency between CMR and Hadoop.Results showed that CMR had significant improvement of Hadoop dealing with the iterative tasks.

Keywords/Search Tags:

massive data processing, parallel computing model, Mapreduce, cloud computing, iterative tasks

PDF Full Text Request

Related items

1	Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments
2	Performance Optimization And Applications Of MapReduce In Cloud Computing
3	Research On Parallel Skyline Algorithms And Their Applications In Cloud Computing Environment
4	The Research Of Parallel Clustering Algorithm Of Massive Data In Cloud Computing Environment
5	The Research And Implementation Of Comprehensive Mapreduce
6	Study On Parallel Alogrithm Of Large-scale Numerical Calculation In Cloud Computing Environment
7	Researches And Application Of Mapreduce Parallel Programming Model For Cloud Computing
8	Reseach On Mapreduce Parallel Computing Platform For Cloud Computing
9	Research On MapReduce Parallel Programming Model In The Cloud Computing
10	Research On Parallel Approaches For Processing Massive High Speed Rail Noise Data Based On Cloud Computing