Font Size: a A A

Research On Algorithm Analysis And Modificating Of Job Scheduling For Hadoop

Posted on:2014-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q R YangFull Text:PDF
GTID:2268330401474254Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet and the increasing demand for the ability of massive data processing and massive computing, Cloud Computing came into being to solve this problem. Hadoop is an open source distributed computing platform which is most widely used by lots of companies and researchers. One of the core technologies for Hadoop platform is Job Scheduling, which affects the resource utilization rate and system efficiency directly. Therefore, it is essential and meaningful to analyze and improve the Job Scheduling algorithms in Hadoop platform.This thesis is focused on the following aspects:First of all, the background, related concepts and technological developments of Cloud Computing and Hadoop platform is well elaborated based on the analysis of lots of related literature. Also, the two core parts of Hadoop platform, i.e. HDFS and MapReduce are highlighted and explained.Secondly, a detailed research was carried out on the existing mainstream Job Scheduling algorithms in Hadoop platform, which are First in First out, FairScheduer and CapacitySgceduler. With the experimental studies, the practical effects of the above three major Job Scheduling algorithms were compared and analyzed to obtain the most suitable application scenarios for every scheduler and their merits and drawbacks.Thirdly, to make up for the disadvantages of FairScheduler for memory-intensive jobs, FMScheduler which is a new FairScheduling algorithm based on the balance of memory, was proposed to optimize the original algorithm. On the basis of original algorithms, a new calculation method adjusting job weight was given in this thesis, and memory comparison mechanism and job resource reservation mechanism were adopted.Finally, under various job-submitting scenes, the performance of FMScheduler was compared with three other scheduling algorithms. The experimental results demonstrate that FMScheduler is optimal in the multi-user and multi-job environment which containing memory-intensive jobs compared with the three other algorithms because of least average and total response time of jobs, improving the resource utilization and ensuring the memory-intensive jobs more fair execution opportunities.
Keywords/Search Tags:Cloud Computing, Hadoop, MapReduce, Job Scheduling
PDF Full Text Request
Related items