Font Size: a A A

Research Of Stragglers Recognition And Processing In MapReduce

Posted on:2019-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z J QiaoFull Text:PDF
GTID:2428330542996921Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the development of intelligent hardware and intelligent software,data in the world has shown explosive growth.MapReduce,a distributed computing frameworks have emerged.In the MapReduce framework,a job is divided into multiple tasks,which are distributed to multiple nodes for parallel execution to speed up the completion of the job.However,during the execution,some tasks are abnormally slower than other tasks,delaying the completion of the whole job,which are the stragglers.Speculative execution strategy is a common way to solve the stragglers problem.By simply backing up stragglers to alternative machines,it is expected that speculative tasks will complete before stragglers.Therefore,the speculative execution strategy includes two steps:identifying the stragglers in the job and selecting the appropriate backup nodes.Different speculative execution strategies put forward many stragglers identification methods.FlexSlot uses k-means clustering algorithm to identify stragglers.However,regardless of whether there is a straggler in the job,FlexSlot can always get stragglers,resulting in low recognition accuracy of stragglers.This paper analyzes the reasons of low recognition accuracy of stragglers,improves FlexSlot,and proposes a clustering optimization based stragglers recognition strategy.Firstly,in order to find task partitions that are more in line with the true situation of the task execution,artificially specify a threshold range for k of the k-means.In this threshold range,the task's progress rate and processing bandwidth are used to cluster the tasks in parallel,multiple clustering results are obtained;Secondly,DBI is used to find the optimal task partitions.Thirdly,the number of stragglers is limited by the number of tasks of a job and the number of idle resources,avoiding most of the normal tasks are identified as stragglers.Finally,limiting the slowest task class to be a times slower than second slowest task class guarantees that the stragglers are indeed slow.Select the appropriate backup nodes for the stragglers.Some existing speculative execution strategies,when selecting a backup node,either avoid selecting nodes with poor node performance,or determine the backup node by predicting the speculative task's backup time.However,the methods to determine the backup node by predicting backup time,often use historical task information that completed on node,regardless of actual resource demand characteristics of backup task,so can't predict backup time well.Therefore,this paper proposes a Dijkstra algorithm based nodes searching model.Firstly,based on the resource allocation and processing bandwidth information of all tasks in the same job,a resource speed model is established by using linear regression to predict the processing bandwidth of the backup task on the possible backup node,obtaining the processing time of the backup task.Secondly,all nodes in the cluster are regarded as the vertex of the undirected graph and the processing time of the backup task and the data migration costs are regarded as the weights between the vertex of the undirected graph.Finally,according to two search strategies,the shortest backup time and optimal backup node are obtained.Experimental results prove that the straggler recognition accuracy of a clustering optimization based stragglers recognition strategy proposed by this paper is higher than that of FlexSlot and MCP under multiple workloads.Dijkstra algorithm based nodes searching model reduces job completion time about 10 percent compared to FlexSlot and about 20 percent compared to MCP.In terms of speculation success rate,this paper's speculative execution strategy improve by up to 12.4 percent compared,to FlexSlot,by up to 48.8 percent compared to MCP.In terms of resource utilization,this paper's speculative execution strategy is superior to FlexSlot and MCP.
Keywords/Search Tags:MapReduce, stragglers, Clustering Optimization, resources speed model, Dijkstra algorithm
PDF Full Text Request
Related items