Font Size: a A A

Based The Hadoop Platform Job Scheduling Algorithm

Posted on:2012-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z X YuFull Text:PDF
GTID:2218330338455808Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology, the explosive growth of Internet data, is facing massive data processing problems. Cloud computing as a new model proposed, developed with great speed. Hadoop which is open source cloud computing system, imitats and realizes the main Google cloud computing technology and accesses to a wide range of use. Haoop is a platform for continuous development and improvement. In the Hadoop job scheduling is the academic research and industry hot topics. Improving and enhancing the job scheduling capabilities can enhance the ability of massive data processing. Hadoop platform for improving the performance and efficiency of resource use has important practical significance.This paper describes the technical background of Hadoop, and then introduces the core of the Hadoop platform that is Hadoop Distributed File System and the MapReduce computation framework, a detailed analysis of the Hadoop job scheduling process. Then, I researched Hadoop platform of existing scheduling algorithms, namely FIFO algorithm, capacity algorithm, fair scheduling algorithm. A detailed analysis of fair scheduling algorithm.In-depth understanding of the Hadoop platform job scheduling algorithm and its detailed study, I proposed improvements for the job scheduling algorithm. First, the analysis of fair scheduling algorithm for data localization, then I analyzes the delay algorithm based on this algorithm and proposed the response time T of the delay improved algorithm that guarantees Service Level Agreement(SLA) for specific users (such as:paying customers) of requirement, this is mainly for short job. Secondly, I hope nodes through the use of past history and learning job properties to improve job scheduling, I proposed Feature Weighting-based Naive Bayes classification algorithm to improve scheduling of task allocation, detailed analysis of the algorithm ideas, and finished the prototype design and implementation.And then I builded the lab environment for test the performance of improved algorithm in our lab, the first test is guaranting a specific response time T delay algorithm. Experiments showed that it reached the requirements for the response time T to, but the loss of part of the data localization. Second, the experiment based on Feature Weighting-based Naive Bayes classification scheduling algorithm, testing its ability for learning, feature weightied impacting on performance of job, the performance of decision-making accuracy and performance comparison of scheduling algorithms for existing scheduling algorithms.
Keywords/Search Tags:Cloud Computing, MapReduce, Job Scheduling, Feature Weighted Naive Bayes
PDF Full Text Request
Related items