Font Size: a A A

Research And Implementation Of Job Scheduling Performance Optimization Technology Based On Hadoop Cluster

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:S S WangFull Text:PDF
GTID:2428330605455972Subject:Engineering
Abstract/Summary:PDF Full Text Request
Hadoop implements a distributed processing mechanism through MapReduce and becomes the preferred tool for dealing with big data problems.Hadoop platform realizes reasonable job scheduling through simple programming interface.Job scheduling is responsible for the computing resources in the cluster and the scheduling execution of jobs Job scheduling technology affects the performance of Hadoop platform and the resource utilization rate in the system.Reasonable scheduling algorithm can effectively improve the efficiency of the system in processing jobs.Therefore,this paper focuses on the job scheduling algorithm.Firstly,the job priority is optimized,and on this basis,the appropriate task scheduling is carried out according to the load of the task.In order to meet the needs of different users,a Hadoop cluster is built to improve the mixed task scheduling strategy of dynamic priority.The improved method calculates the new priority of the job from multiple dimensions through three parameters:the static priority of the job,the task value of the job and the estimated completion time of the job,allocates resources according to the level of priority,and solves the problem that emergency tasks cannot be processed due to uneven resource allocation,and improves system performance.With the change of time,the priority of the job will change dynamically,and after a job has been run,it traverses the job queue again to select a new job.Aiming at the problem that Hadoop cluster produces a large number of non-local tasks under the default task scheduling strategy,which increases the network transmission time,affects the average running time of jobs,and reduces the resource utilization of Hadoop system,a load balancing task scheduling strategy is proposed.In this method,if the estimated completion time of the local task of the node is less than the average of the estimated completion time of all nodes executing the local task,it is the node with lighter load,and vice versa.In the process of executing the task,we first judge the load of the node and then allocate the task reasonably according to the load of the node.This strategy can prevent the occurrence of idle tasks of some nodes and heavy tasks of other nodes,thereby improving the ability of the cluster to process data.In this paper,the feasibility comparison experiment is used for testing and analysis.Under the same experimental environment,the performance of the improved scheduling algorithm is compared with that of the original scheduling algorithm in terms of data locality and job completion time.The experimental results show that the improved scheduling algorithm has a certain effect in improving data locality and reducing the average job completion time.It increases the data locality by 20.8%,and reduces the average job completion time by 27.04%,making the job queue run reasonably and orderly,the job scheduling length is shorter,and the real-time performance of the system is improved.
Keywords/Search Tags:Job scheduling algorithm, mixed task dynamic priority, real-time performance, node load
PDF Full Text Request
Related items