Font Size: a A A

The Research And Implementation Of Diversity Demand Oriented Parallel Computing Model

Posted on:2015-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:P Z LiuFull Text:PDF
GTID:2428330491460283Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the advance of the information technology,there has been an explosive growth of the data that can be acquired and stored by human in the last decade.It is becoming a daily requirement of many companies and institutions to deal with massive data.This paper illustrated the MapReduce,which is most popular massive data processing model at present,and analyzed the shortages of the model in depth,especially in the dataflow and static partition mechanism of the intermediate data.The single fixed dataflow made the model unsuitable for iterative computations,and would produce unnecessary Map phase and DFS IO.The static partition was easy to cause data skew and unsuitable Reduce instances.To deal with these problems,this paper proposed the Comprehensive MapReduce(CMR).CMR was a generalized model of MapReduce.All the work that MapReduce can do could be dealt with by CMR.However,CMR has significant improvement of MapReduce in the aspect of dealing with iterative tasks and dynamic flow control.In CMR,iterative tasks can be executed in one job.Users can use dataflows to connect Functions and can define the condition when the iterative task can stop.Function can deal with multi-inputs,and produce multi-outputs.The CMR had significant advantages in efficiency and usability than Hadoop.In the paper,some experiments had also been introduced for comparing the efficiency between CMR and Hadoop.Results showed that CMR had significant improvement of Hadoop dealing with the iterative tasks.
Keywords/Search Tags:massive data processing, parallel computing model, Mapreduce, cloud computing, iterative tasks
PDF Full Text Request
Related items