Font Size: a A A

Research On The Recognition And Scheduling Algorithms Of Slow Tasks In Hadoop

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:F J SunFull Text:PDF
GTID:2518306335480264Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology,large-scale data is generated at all times,one of the main parts of the Hadoop platform—MapReduce plays an important role in processing these data.Under MapReduce,a job is decomposed into multiple small tasks,and then the tasks are distributed to multiple nodes for execution to improve the efficiency of the job.However,during the running of the job,some tasks become slow tasks due to node failures,network congestion and other reasons.At present,the main method to solve the slow tasks is:in order to speed up the completion of the job,first identify the slow tasks,then back up for them,and assign them to other nodes for execution.Therefore,studying the identification and scheduling of slow tasks in Hadoop is of great significance to improve the performance of the system and shorten the execution time of the job.The main work of the paper is as follows:(1)A slow task recognition algorithm based on LWLR is proposed.First,types of tasks are divided,which are divided into four types:(CPU-intensive and IO-sparse),(CPUintensive and IO-intensive),(CPU-sparse and IO-intensive),and(CPU-sparse and 10sparse).Then the relationship between the progress of the task and the execution time is established by introducing the LWLR algorithm.Finally,the predicted completion time of the tasks are sorted from largest to smallest,and the top 25%tasks are selected as slow tasks.Experimental results show that compared with the LATE algorithm?the CPL algorithm and ELATE algorithm,the accuracy of the slow task recognition of the algorithm in this paper is improved by 27.47%?7.27%?4.27%respectively.(2)A scheduling algorithm for slow tasks based on FWA is proposed.When selecting the optimal node,the success rate and load of the node are comprehensively considered.When determining the node load,the four influencing factors of CPU usage,IO usage,bandwidth usage and memory usage are considered,and the analytic hierarchy process is used to determine the degree of influence of each factor on the load,and the optimal node with the largest ratio of success rate to load is selected.After selecting the optimal node,the problem of low balance may occur.And the parallelism of the fireworks algorithm can be used to deal with task scheduling problems,and it can also achieve load balancing.Therefore,when the number of backup tasks reaches the threshold of the total number of tasks,the fireworks algorithm is introduced to ensure the load balance of the nodes,and the backup tasks are scheduled to the node that completes the tasks fastest for execution.Experimental results show that in terms of the completion time of the job,the algorithm in this paper is 26.18%?14%and 5.75%lower than the LATE algorithm?the CPL algorithm and the ELATE algorithm respectively.In terms of the success rate of backup tasks,the algorithm in this paper is 20.35%?3.9%and 1.7%higher than the LATE algorithm?the CPL algorithm and the ELATE algorithm respectively.
Keywords/Search Tags:Hadoop, MapReduce, Slow tasks, Locally weighted linear regression, Analytic hierarchy process, Fireworks algorithm
PDF Full Text Request
Related items