Font Size: a A A

Task Resource Allocation And Control System Based On Hadoop Design And Implementation

Posted on:2017-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330518994768Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Hadoop is a developing and running large-scale data processing software platform belongs to Apache,is an open source software framework implemented with Java language,to achieve distributed computing for massive data with cluster composed by a large number of computers.The core design of Hadoop framework is HDFS and MapReduce.HDFS provides mass data storage,MapReduce provides data operations.As one of cloud computing solutions,more and more attention has been attract to Hadoop.Hadoop resource scheduler allocates resources for tasks required computer resources,to enable them to successfully complete computing tasks.Good resource scheduling can make full use of resources,avoid task running problems and improve the utilization of computer resources.So It makes sense to do task resource allocation and control of Hadoop well.Based on extensive research on scheduling algorithms,using historical data to guide the task resource allocation and control of Hadoop.This paper introduces a data acquisition system.Information collection system can collect,transport,save real-time information in computing nodes,executive Information of map tasks,reduce tasks.We will store the collected information in a database for later scheduling.In the wide-ranging and detailed basic research on fair scheduling,we found two areas can be improved in fair scheduling.First,Each task requires a different memory.The program need certain memory support to run,if a task requires considerable memory,but compute nodes are assigned to perform tasks can not provide enough memory,resulting in that the task is not able to normally executed.Then the task will be executed rather slowly on this computing nodes,It affects the execution of other tasks.Secondly,fair scheduling algorithm is used to ensure the load balancing on each node by the number of tasks.However,each task has its own characteristics,different resource consumption and different types of jobs.Such allocation can not achieve good load balancing.This paper presents using historical data to estimate memory of upcoming scheduled job and memory status of the current node,so as to forecast whether the job is expected to be able to successfully complete,in order to carry out admission control.By analyzing the job types of task queue and type of task on the node,we select the best task from the job queue,to achieve load balancing.
Keywords/Search Tags:Hadoop, admission control, load balancing
PDF Full Text Request
Related items