Font Size: a A A

The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform

Posted on:2017-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X W LiFull Text:PDF
GTID:2348330503481936Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology in recent years, data has an explosive increment. With the characteristics of high efficiency and scalability, cloud computing is widely used to analyze and process the massive amount of data.MapReduce is a new kind of distributed computing model. The main characteristic of MapReduce is to encapsulate the details of traditional distributed program. MapReduce separates the business logic and the implementation detail. Hadoop is an open source implementation of MapReduce computing model and has been widely used. Job scheduling algorithm is one of the core algorithm of Hadoop and mainly responsible for scheduling tasks and allocating resources. Job scheduling algorithm affects the performance of the cluster, so the research of job scheduling algorithm is very significant.Firstly, it is introduced three kinds of commonly used job scheduling algorithm in this dissertation, including FIFO, Fair Scheduler, Capacity Scheduler and some improved algorithms which are proposed by academia.For the shortcomings of low localized data rate and long waiting time, it is proposed a data preprocessing scheduling algorithm based on the fair scheduling algorithm in this dissertation. DP-L algorithm transmits the input data to the node's disk before the non-localized task starts the scheduled operation. Through the consumption of the network resources and the disk spaces, DP-L algorithm improves the data localization of the cluster.Then, for the shortcoming of low execution rate and long responsive time, we propose an algorithm named DP-R which is based on the main resource. DP-R algorithm calculates the main resource share of users and jobs. DP-R algorithm selects the user and job whose main resource share is minimum to allocate the resources. Under the premise of ensuring the fairness of users and jobs, DP-R algorithm improves the efficiency of theresource.Finally, I design four experiments to verify the performance and feasibility of the algorithm in this dissertation. Experimental results show that the proposed algorithms not only improve the execution rate of task and shorten the response time of cluster, but also improve the data localization rate.
Keywords/Search Tags:Cloud Computing, Hadoop, MapReduce, Resource Allocation, Data Localization
PDF Full Text Request
Related items