Based On The Research Of Parallel Computing Framework Of YARN

Posted on:2016-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:M M Zhu

Full Text:PDF

GTID:2308330470976679

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The core of the Hadoop framework of the Apache framework is the MapReduce programming model and the HDFS distributed file system. MapReduce provides the parallel computation for the massive data, while HDFS provides the storage for the massive data.MapReduce is a kind of parallel programming model, it is mainly used for parallel computing of huge amounts of data sets. In initially launched a few years,this kind of parallel programming model has achieved many successful cases, in the industry also has been widely support and affirmation, but as the scale of the distributed system cluster growth and a surge in other working load, the original framework of problems are gradually surfaced. The MapReduce programming model needs to make large-scale adjustments to its memory consumption, scalability, thread model, reliability and performance flaws in the existing mechanisms. Over the past few years, Hadoop team has done some bug fixes, but with the cost of the repair is growing, this shows that the original framework to make changes more and more difficult. So the open source Apache organization in order to promote the Hadoop framework to go farther, fundamentally solve the key problems affecting the performance of MapReduce, starting with version 0.23.0, perfect reconstruction of old MapReduce framework and on the structure occurred fundamental changes. Apache open source organization after the reconstruction of the MapReduce framework named Hadoop 2 or called YARN.In this paper, the MapReduce programming ideas, working principle, specific steps and methods are described in detail. Then, detailed expounds YARN programming model and YARN framework, working principle, the concrete steps and methods. And YARN is compared with MapReduce, The deficiency and shortcomings of MapReduce were studied, and Outlines the differences of the YRAN and MapReduce.Finally, through constructing the Hadoop cluster environment, and then based on the framework of yarn were MapReduce parallel computational experiments, through experiments proved that based on the yarn under the framework of parallel computational efficiency and reliability.

Keywords/Search Tags:

Hadoop, MapReduce, YARN, ID3, Parallel computing

PDF Full Text Request

Related items

1	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
2	The Design Of The Cloud Computing System Based On Hadoop
3	Research And Optimization Of Parallel Computing Framework Based On MapReduce
4	Research And Implementation Of Highresponsive Hadoop Computing Resource Scheduler Based On YARN
5	Research On SLA-Aware Energy-Efficient Scheduling Strategy For Hadoop Yarn
6	Research And Implementation Of Parallel Clustering Algorithm Based On Approximate Spectrum Hadoop MapReduce
7	Research On The Energy-Efficient Hadoop YARN Resource Scheduling Strategy Based On State Matrix
8	Researches And Application Of Mapreduce Parallel Programming Model For Cloud Computing
9	Research On MapReduce Program Based On YARN
10	Design And Implementation Of Application System Framework IMSAA Based On Hadoop