Font Size: a A A

Research And Development Of New Scheduler In Hadoop Cloud Platform Based On Improved Simulated Annealing

Posted on:2015-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2298330434458753Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays, with the development of "Cloud Computing" platform, more and more colleges, universities, research institutes and Internet companies began to carry out the project of cloud platforms in-depth, in order to face the coming Age of "Big Data" better. As a fully open source cloud platform, Apache Hadoop was favored by most companies, engineers and experts, which have participated in the research and development of Hadoop Cloud Computing platform.However, with the hot development of "Cloud Computing","Cloud Service" providers were faced with the processing of increasingly huge and increasingly complex data. Various structured and unstructured data make the existing Hadoop platform dealing with very difficult. At this point, the native Hadoop has been difficult to effectively respond to a variety of complex tasks submitted by users.In this paper, to the problem of long waiting time and completion time when existing scheduler dealing with high memory requirement under the MapReduce framework, we studied the scheduling policy of capacity scheduler, and proposed a new scheduling policy of queue level based on simulated annealing algorithm. Used the queue resource utilization as the annealing probability, and set the expected completion time态limitative resources and so on as parameters, optimized the scheduling effect of capacity scheduler through the characteristics of simulated annealing algorithm such as high efficiency, low initial condition constraint. Our work in this paper is as follows:First, to the current Hadoop platform, we made analysis and research in the Hadoop design, operation mechanism, mastered the MapReduce processing framework, and learned the Hadoop existing scheduler in-depth, including Hadoop default FIFO scheduler, native Fair Scheduler, native Capacity Scheduler, as well as Resource Aware Scheduler and Adaptive Scheduler which has been designed but not used yet officially before the version of Hadoop2.0. To there five schedulers, we explored their design, studied and analyzed their scheduling mechanism, pointed out the problems currently exist in these schedulers.Then, based on previous work of summarizing in a variety of common problems existing in present scheduler, we proposed and designed a new scheduler which could effectively resolved the problem of scheduling memory-intensive job with difficulty. We used the improved simulated annealing algorithm as the design ideas, analyzed the traditional simulated annealing algorithm, and proposed the method of applying the improved algorithm in scheduler, then we designed the scheduling policy based on simulated annealing algorithm and proposed the new scheduler in Hadoop.Finally, we tested the new scheduler in actual situation, including the freedom switching of scheduler in Hadoop, the scheduling in different types of jobs, and the comparison of scheduling in the same job with different scheduler. The experiment results showed that the scheduling policy proposed in this paper could effectively reduce the possibility of task waiting when scheduling the jobs of high memory requirement, and it also achieved the lower completion time and better resource utilization.
Keywords/Search Tags:MapReudce, Capacity Scheduler, Simulated Annealing, QueueScheduling Policy
PDF Full Text Request
Related items