Font Size: a A A

Research And Implement Of Job Scheduling Method For Multi_User MapReduce Clusters

Posted on:2011-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2178330338489889Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The current data intensity computation needs to process the PB level data set and the GB level data stream, facing the large-scale data management, the complex computation environmental management, scalable computing platform problems. Hadoop is a kind of scalable distributed computing architecture which can combine a lot of inexpensive PCs to provide super computing, It's Map-Reduce parallel computing framework prepare an easy programming model for users.This paper in-depth analysis of the existing Hadoop cluster's job scheduling approach,then we in-depth research the problem of poor data locality that caused by the existing methods of multi-user job scheduling. For the existing scheduling algorithms of Hadoop can not get good data locality, we achieve a waiting time-based scheduling method, which give priority to scheduling task to node where the required data been stored, so can achieve better data locality, effectively reduce the IO overhead in calculation process, to achieve purposes of increasing system throughput and reducing the average response time of a single work.To verify the validity of the method, we give the design and implementation for our proposed scheduling method and verified by experiments. The results show that the method not only guarantees multi-user's fair share cluster,and the data locality of the node has been greatly improved, increase the throughput of the cluster system effectively, effectively reducing the average response time of a single job.
Keywords/Search Tags:Distributed Computing, MapReduce, Hadoop, Job Scheduling, Waiting Scheduling, Priority, Multi-user Shared
PDF Full Text Request
Related items