Font Size: a A A

Research And Implementation Of Multi-layer Job Scheduling Algorithm Based On Hama Parallel Computing Framewokr

Posted on:2015-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y S HuFull Text:PDF
GTID:2308330473453321Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of the Big Data technologies in industry and academia, a large number of distributed computing platforms been constructed. The most widely used one is Apache Hadoop. Hadoop hides the detailed implementation of distributed system, and makes the application developers to focus more on algorithm logic. However, Hadoop has its limitations. Its efficiency is very low when dealing with graph computation or machine learning. The creation of Apache Hama that based on BSP(Bulk Synchronous Parallel) model make up the shortcoming of Hadoop. However Hama is a young platform, there are many components in Hama need to be improved, especially its core module – job scheduler. Hama currently used a first-come-first-serve job scheduling algorithm that does not has the ability to share the cluster resources in multi users, and it also has a great impact on the resource utilization of the cluster.The purpose of this thesis is to design and implement a new job scheduling algorithm based on Hama parallel computing framework, to compensate the shortcomings of Hama’s first-come-first-serve job scheduling algorithm, improve Hama’s resources utilization and provide greater flexibility for Hama’s job scheduling. To achieve this goal, the main work and contributions of this thesis include:Firstly, I analyzed the system architecture of Hama, introduced the implementation of BSP model in Hama, summarized the job running process in Hama and studied the scheduling framework and first-come-first-serve job scheduling algorithm of Hama by reading the source code of Hama. Meanwhile, I also summarized the related technologies of HDFS and MapReduce that involved in Hama, then compared the differences of scheduling mode in MapReduce and Hama.Subsequently, based on the outcome of above research, and combined with the characteristics of BSP model. This thesis designed and implemented a multi- layer job scheduling algorithm based on Hama parallel computing framework, and described the design idea and implementation process of the algorithm in detail.Finally, this thesis did some experiments to verify and test the algorithm. The results of those experiments indicated that we had achieved the goals of the design, compensated the shortcomings of Hama’s first-come-first-serve job scheduling algorithm, improved Hama’s resources utilization and our algorithm’s performance is better than the Hama’s first-come first-server job scheduler.
Keywords/Search Tags:Apache Hama, BSP model, Distributed Computing, Job Scheduling Algorithm
PDF Full Text Request
Related items