Font Size: a A A

Research On Improving The Fault Tolerance Performance In MapReduce

Posted on:2015-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:H C WuFull Text:PDF
GTID:2428330488499659Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of network information technology,the Internet has penetrated into all walks of life,and the number of Internet users also is rising constantly.These lead to the explosive growth of Internet data and provide a new opportunity for distributed computing.MapReduce is a programming model for distributed parallel computing,and it's proposed by Google to process large-scare data.MapReduce has the characteristics of parallel processing jobs automatically,high reliability and easy to use.Hadoop is an open-source distributed parallel computing plantform based on MapReduce.Because of simple to customize and use,it widely used by business and research institution for the processing and research on large data sets.Hadoop uses HDFS(Hadoop Distributed File System)and MapReduce to storage and process Big Data.Hardware failures are common in MapReduce.Therefore,fault-tolerant mechanism is the guarantee of robustness and efficiency for storage and computing.Speculative execution is an important means for tolerating computation failures.It achieves the goals of reducing the task's execution time and saving the cluster resources through finding the unusually slow task and speculative copy of this task on another machine to be executed.The existing speculative execution strategies include Heuristic-Based strategy LATE(Longest Approximate Time to End)and based on cluster resources cost-benefit strategy MCP(Maximum Cost Performance).There have difference between the implemention principles of these two strategies.LATE is more easily to implement than MCP,but MCP is more effective.Through the summary and analysis on the problems that exist in the LATE strategy and specific to the shortage of inaccurate on estimating running tasks' remaining execution time resulted in it do not consider the impact of dynamic system load on the running time of tasks,we proposed a system load aware heuristic speculative execution strategy ERUL(Estimate Remain time Using Liner relationship).It extends advantage of the observation that system load and CPU-bound task's execution time have a linear relationship to estimate tasks' rest execution time.So it can estimate the remaining time more exactly.At the same time,ERUL also solved the existing problems in LATE that include LATE can not handle data skew in Map task,long time to identify stragglers and can not evaluate the performance of the node accurately.Experiments show that MCP works more accurately in estimating the remaining running time of tasks than LATE and can improve cluster performance better than LATE.With the deep analysis of the workflow and the exsting problems in MCP,we proposed an improvement strategy exMCP(extensional MCP)in heterogeneous distributed environments.The MCP model does not take there have difference values between the heterogeneous nodes' slot into account calculating the cluster resource.So there will make mistakes when using the model.At the same time,MCP does not classify the map tasks according to whether the tasks satisfy data localization and this will lead to all the tasks which not satisfy data localization to be identified to straggler.In exMCP,it calculates the cluster resource according considering the value of slots and classifies the map tasks to speculative execution.In addition,it improves the existing problems in selecting fast nodes to execute backup tasks.The experiment results show that the success rate of speculation and the cluster performance improvment of exMCP are better than MCP.
Keywords/Search Tags:MapReduce Fault Tolerance, Speculative Execution, Hadoop, MapReduce, Hadoop Scheduling
PDF Full Text Request
Related items