Font Size: a A A

Research On Resource Management And Job Scheduling Based On Hadoop

Posted on:2018-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:K W ZhangFull Text:PDF
GTID:2348330518961209Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,cloud computing technology has become a hot research,which causes the attention of many scholars and enterprises.At present many kinds of cloud computing system emerged,Hadoop is a distributed computing platform having an important position in the cloud computing technology.Hadoop is a distributed storage and parallel computing framework.Resource management and job scheduling have always been an important issue for Hadoop,because it is directly related to the overall performance of Hadoop platform and the utilization of system resources.But with the improvement of user requirements,the existing Hadoop scheduling algorithm is starting to show some shortcomings,which cause a decline in the whole Hadoop performance.So it is necessary to propose an algorithm improvement strategy aiming at the shortcomings of the Hadoop.In this paper,through consulting a large number of relevant literatures and materials we learn related technical knowledge of cloud computing,GFS and MapReduce of Google and Hadoop carefully.Based on the above theoretical study,we study the Hadoop resource management and job scheduling framework deeply from three aspects: distributed file system HDFS,distributed computing framework MapReduce and three common scheduling algorithms of Hadoop.This paper mainly studies the architecture,principle and operation flow of HDFS and MapReduce.At the same time,it analyzes job scheduling flow of Hadoop,FIFO scheduling algorithm,Capacity scheduling algorithm,Fair scheduling algorithm and the problems of these scheduling algorithms deeply.Based on the research of Hadoop resource management and job scheduling framework and algorithms,In this paper,aiming at the shortage of Hadoop default FIFO scheduling algorithm for flexible and dynamic aspects,based on the existing FIFO scheduling algorithm we design a improved scheduling algorithm,which regards the CPU utilization and the remaining memory as load indicators for the load.The improved scheduling algorithm compares the real-time CPU usage and memory remaining to the corresponding thresholds to determine whether to continue assigning tasks.In order to compare the performance of the scheduling algorithm before and after the improvement,the data was processed on the Hadoop cluster to test,and the performance of the improved algorithm was proved to be better than the original algorithm.
Keywords/Search Tags:Hadoop, Resource Management, Scheduling Algorithm, MapReduce
PDF Full Text Request
Related items