| In recent years,with the development of cloud computing technology,academia and industry are beginning to cloud computing technology-depth analysis and research,in order to better cope with the arrival of the era of big data.Apache's Hadoop as open source cloud computing platform has been widespread concern.However,cloud computing service providers are also facing problems with the increasing amount of data generated by the larger and more complex data processing.Especially the various types of data processing tasks facing a large number of user-submitted how efficiently handle these tasks is a problem with the existing Hadoop platform to solve this problem is becoming increasingly difficult.In this dissertation,after in-depth analysis found that the traditional job scheduling method in strict compliance with the principle of first come,first served,after the result of the arrival of small tasks(less intensive)due to waiting for the big task in front of arrival(resource-intensive),leading to the completion time delay is too long.However,in general,the response time for small tasks are more sensitive to big task followed.Therefore,this method has seriously affected the user experience.This dissertation the problem into an optimization problem,considering the efficient allocation of resources,and taking into account the first come first serve two factors.This dissertation attempts to adopt appropriate scheduling policy so that small task to get quick response,rather than strictly in accordance with the policy of first come,first served.After sufficient research and compare different scheduling planning strategy paper the task scheduling problem is modeled as a knapsack problem,the problem belongs to a complex combinatorial optimization problem is NP problem difficult.For this type of problem,genetic algorithm has the advantage of looking for the optimal solution,this paper designed a new scheduler uses a genetic algorithm Hadoop task scheduling assignment,the scheduler taking into account the principle of equitable and efficient use of resources and first come first serve.Thesis on Hadoop platform for new scheduler experimental verification and scheduler existing computing power to do comparative experiments,using real-time monitoring of system resources and statistical calculations of the average time to complete the job comparative experiment.Experimental results show that the proposed scheduling algorithm as a new job scheduling strategy is effective,can solve the problem of small tasks to wait too long. |