| Hadoop is a distributed and open-source cloud computing platform,which has characteristics of low cost,high fault tolerance,and it is a parallel processing system runs on large clusters.Hadoop platform is resource allocation and scheduling the execution of programs to users’ jobs.Job scheduling algorithm whether suitable or not is directly related to job execution efficiency of users’ job.In many cases,users use Hadoop own scheduling algorithms,but the original scheduling algorithms is not appropriate on-line query and analysis in the batch,there are some problem in short jobs scheduling,so the research of Hadoop job scheduling algorithms and optimization is very necessary.In this paper,the main contents are as follows:(1)Analysis the current research at home and abroad of Hadoop job scheduling algorithm,in batch of online query and analysis exist small scheduling unreasonable phenomenon,the analysis of the principle of MapReduce job scheduling in Hadoop,the theoretic basis of job scheduling,the procedure and advantages and disadvantages of Hadoop own algorithm.(2)Analysis the principle and advantages and disadvantages of MMS algorithm,which bring in queuing theory,for the lack of existing Hadoop job scheduling algorithm in dealing with the small operation,propose a small operation based on the priority of the M/G/1 job scheduling algorithm.Algorithm is introduced into the queuing theory of M/G/1 model,when submitting a job to queue queue,firstly,according to the length of the operation were prioritized.According to M/G/1 model of job scheduling algorithm to calculate the average waiting time of system,and when the appearance of a certain amount of small operations for a period of time,small operation rerouted to anterior queue priority calculation.(3)Set up Hadoop clusters for experiments,the result validate the scheduling algorithm based on priority can complete scheduling execution,and reduce the operation of the overall waiting time.Test data can obtain by taking the mean value after many times executions the mean method.With advanced first out comparative test algorithm and public and reallocation of job scheduling algorithm can be obtained,the job scheduling algorithm based on priority can be faster to adapt to small operations on the situation,timely adjust the order,relationship of the whole job completion time and improve the system utilization and load capacity. |