Font Size: a A A

The Research And Implementation Of Hadoop Scheduling Algorithm

Posted on:2015-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2268330428978853Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud computing which is a new information technology for big data analysis and processing brings us a new vision. It is a model of business with large storage, high reliability, easy to extend and distribute. It can distribute computing tasks in a large resource pool which allows users to get computing power, storage space and information services on demand. Hadoop is an open source cloud platform for data analysis and processing. It can schedule and process large number of jobs. Scheduling is mainly allocating resource rationally and controlling the order of jobs. Hadoop which is running in a large cluster consists of thousands of servers. It controls and schedules large number of tasks. A suitable scheduling algorithm has a significant impact on job’s response time and the ability to interact.MapReduce including map and reduce stages is a programming model and an associated implementation for processing large data sets. Dynamic allocation of resources needs to estimate the execution time of MapReduce, but current research seldom involves this. So we present an improved forecast method. In Map stage, the average method is taken to estimate execution time according to the historical information. In Reduce stage, the combination of sampling and feedback method is taken to estimate execution time. This algorithm can estimate the execution time of the task more accurately and provide an effective way for dynamic allocation of resources.Hadoop built in scheduler cannot meet the requirement that distinguish different types of jobs effectively and complete jobs before a given time. We propose a type specific and deadline based scheduler(TSD). This algorithm consists of two parts:a mechanism specifies the jobs into CPU-bound jobs and I/O-bound jobs, a deadline based scheduling algorithm which can set priority according to the final completion deadline. Experimental results show that TSD algorithm improves greatly on ensuring jobs’ success ratio, shortening jobs’ response time and improving the utilization of cluster’s hardware resources than the algorithm which only takes the deadline into account.
Keywords/Search Tags:Cloud Computing, Hadoop, MapReduce, Scheduling, Time estimation, Deadline, Job-type
PDF Full Text Request
Related items