Font Size: a A A

Research And Implementation Of Mapreduce Fault Tolerance Method Based On Intermediate Result Checkpoint

Posted on:2018-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:K DingFull Text:PDF
GTID:2348330515455332Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,the amount of data generated by network increased explosively.The traditional storage and computing pattern cannot satisfy the requirements of applications for storage and computing.Cloud computing relies on its excellent distributed processing technology becomes the most popular data processing technology.Among them,MapReduce,an efficient parallel computing framework,has been applied in the field of large data processing widely.At present,there are two common failure types in MapReduce model:task failure and node failure.For task failure,MapReduce handles it by re-executing,that is to say,tasks can be re-allocated after a failure.But this will waste large amount of computing resources,extend average task completion time,and reduce the computational efficiency.Node failure is divided into Master node failure and Worker node failure.For Master node failure,MapReduce adopts duplex fault tolerance method.For Worker node failures,failures not only can cause the loss of intermediate results which are placed on Worker node and generated by Map task,but also can lead to the re-assign and re-execution of tasks.And,currently,there are no ideas to deal with this fault type of MapReduce model.This thesis mainly completes the following three aspects of work.(1)Analyze the shortcomings of MapReduce fault tolerance mechanism of Hadoop source code:by analyzing Hadoop source code,we study the way of handling task failure and node failure and their shortcomings.This provides us a basis for improving fault tolerance methods of MapReduce.(2)Design and implementation checkpointing fault-tolerant mechanism:for task failures and node failures of the computing process of MapReduce,this paper designs and implements checkpointing fault tolerance mechanism.Saving status information and intermediate results of task execution in the form of checkpoint file,and when tasks was re-assigned,we use the saved information to implement task recovery quickly.For task failure,we design and implement Local Checkpointing fault tolerance mechanism,Remote and Query Metadata checkpointing fault tolerance mechanism for node failures.(3)Test and execution of checkpointing fault-tolerance mechanism:After the design and implementation of checkpointing fault-tolerance mechanism,we build a Hadoop cluster,code application and inject faults to the application to verify whether our checkpointing fault tolerance mechanism can provide fault tolerant effectively or not.And to verify the efficiency of our proposed checkpointing fault tolerance mechanism.
Keywords/Search Tags:checkpointing fault-tolerance, intermediate results, Hadoop, MapReduce, cloud computing
PDF Full Text Request
Related items