Font Size: a A A

Fault Tolerance For MapReduce In The Cloud Environment

Posted on:2013-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:2218330362959403Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud Computing has become one of the most important technologies in today'scomputer industry. Along with the rapid development of the Cloud technologies, theform of data transforms from the traditional structured data to semi-structured dataand unstructured data; at the same time, the data size explodes. Traditional databasetechnology can no longer be able to resolve this large volume of data. Therefore, howto handle this Big Data has become a pressing problem. Fortunately, in 2004, Googlepresentedtheirsolution,MapReduce,todealwiththechallengescreatedbythemassivedata sets of the cloud era.In short, MapReduce is a ?exible and highly available architecture for large scalecomputation and data processing on a network of commodity hardware. It is not onlyto handle massive amounts of data to solve performance problems, but also simpli?esthe programmer's way to develop distributed parallel programs. More importantly,MapReduce solves the scalability and reliability issues, which are MapReduce biggestadvantages compared with the traditional database. Around the emerging program-ming framework MapReduce, a variety of research has been launched at domestic andabroad, in which the fault tolerance of MapReduce has been one of the hottest spotsin the this area. Domestic and abroad study for fault tolerance can be summarized asthe following two directions: re-execution and backup. All these studies are attempt-ing to improve recovery mechanisms which can be e?ective when the failure has beendetected and located. However, if the cluster cannot be aware of the failure, the abovework cannot get the expected performance then. Therefore, this thesis will study thefault tolerance of the MapReduce from a new perspective, that is, how to fast and moreaccurately percept the failure node in the MapReduce cluster. Regarding this problem, this thesis attempts to propose two ideas: adaptive expirytime and reputation-based detection model. The adaptive expiry time aims to adaptive-ly change the rigid and ?xed MapReduce cluster expiry time. To do this, it will ?rstestimate the execution time for each job, and then let expiry time adaptive to the esti-mating execution time. At runtime, if JobTracker has not received heartbeat messagesfrom the node during the adaptive expiry time, then that node will be considered as afailure node. In addition, the reputation-based detection model will give each node areputation value, and reduce each reputation when meet the remote fetch failure fromreduce task to map task. If the reputation value of node decay to a lower limit due totoo many remote fetches failures, that node will be considered as a failed node.A large number of experimental data shows that the two proposed solutions aremuch better than the original Hadoop cluster. When the cluster meet a node failure,compared to the original Hadoop, this program can be found in signi?cant reductionin the time of detecting failure. In addition, it is demonstrated by the comparison ofexperiments, that the adaptive expiry time tends to short jobs while the reputation-based detection model is advantageous to the large jobs. Using these two solutionscan be e?ectively working with the existing fault tolerant technology, making Hadoopbetter fault tolerant, not only to quickly locate failures, but also to quickly recover backfrom failures. The main contribution of this thesis is not only limited to propose anadaptive expire time and reputation-based detection model, but to widen the researchideas for the fault tolerant Hadoop.
Keywords/Search Tags:MapReduce, Hadoop, MassiveDataprocess, par-allel computing, adaptive
PDF Full Text Request
Related items