Font Size: a A A

MapReduce Speculation Execution Algorithm In Heterogeneous Environments

Posted on:2017-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YeFull Text:PDF
GTID:2428330488479904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of network information technology,the Internet has penetrated into all walks of life,and the number of Internet users also is rising constantly.These lead to the explosive growth of Internet data and provide a new opportunity for distributed computing.MapReduce is a programming model for distributed parallel computing,and it's proposed by Google to process large-scare data.MapReduce has the characteristics of parallel processing jobs automatically,and easy to use and high reliability.Hadoop is an distributed parallel open-source computing plantform based on MapReduce.Because of simple to customize and use,it widely used by research institution and business for the research and processing on large data sets.Hadoop uses MapReduce and HDFS(Hadoop Distributed File System)to process and storage Big Data.Speculative execution is the guarantee of efficiency and robustness for computing and storage.It achieves the goals of reducing the task's execution time and saving the cluster resources through finding the unusually slow task and speculative copy of this task on another machine to be executed.The existing speculative execution strategies include Heuristic-Based LATE and adaptive MapReduce SAMR.Through the summary and analysis on the problems existing in the LATE strategy,it decides whether to speculative execution by comparing the task's progress rate and success task's average progress rate?but it doesn't consider the resource consumption and variation in load.Speculative Execution for Benefit of Cluster consider speculative execution based on type of nodes and benefit of overall cluster.Experiments show that SEBC works more accurately in estimating the remaining running time of tasks than LATE and can improve cluster performance better than LATE.With the deep analysis of the workflow and the exsting problems in SAMR,this paper puts forward a new Speculative Execution Based on Random Forest,model of SERF is established based on a new generation of Hadoop platform YARN,it takes advantage of random forest machine learning algorithm to predict the status of overall cluster,and map task is divided into the nodeLocal,rackLocal,offSwitch three types.In addition,it improves the existing problems in selecting fast nodes to execute backup tasks.The experiment results show that the success rate of speculation and the cluster performance improvment of SERF are better than SMAR.
Keywords/Search Tags:MapReduce Algorithm, Speculative Execution, Hadoop, MapReduce
PDF Full Text Request
Related items