Font Size: a A A

Research Of Hadoop Job Scheduling Algorithm In Big Data

Posted on:2016-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:P H LiuFull Text:PDF
GTID:2308330461991706Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rise of the mobile Internet, Cloud Computing, Internet of Things and other new technologies, the characters like size, type, speed and value of data have reached an unprecedented degree in a short time. Traditional data have finished qualitative change, and big data era is formally coming. Hadoop, as the first choice storing and processing for huge amounts of data, is an open source platform where distributed processing of huge amounts of data can be supported. It has so many advantages like high efficiency, high reliability, high scalability, and fault tolerance that it has attracted so much attention in industry and academy. A new generation of Hadoop has established resource management system YARN. YARN, as the core module of Hadoop2.0, is mainly used for resource management and scheduling for all kinds of applications. And the pros and cons of job scheduling algorithm are directly related to the overall performance of the Hadoop platform and system resource utilization. So the study of job scheduling algorithm has important significance.Load balancing is very important in cluster system. How to reasonably distribute calculating resources and in the platform and balance resource load is an important problem that need to be solved Hadoop cluster. Existing job scheduling algorithm, that pursues shortest operation completion time without considering the node load ability, is designed based on the homogeneous environment, which is easy to cause the cluster nodes in a heterogeneous environment load imbalance. It could easily lead to multiple jobs competing advantage resources, which caused the situation that some computational resources in heavy Load while others in idle computing resources and seriously degraded the performance of the platform. In view of the cluster load imbalance phenomenon, the Load Balancing Measure Function(LBMF) was proposed. For Hadoop load imbalance problem in a heterogeneous environment, Load Balance Measure Function Particle Swarm Optimization(LBMFPSO) was proposed. In LBMFPSO, the deliverable is treated as a particle, and the location of the node is the search space. LBMF and job execution time are both used as fitness function to direct the particles distribution, and particles finding the optimal solution is the process of job scheduling. And LBMF<5% as a termination criterion of LBMFPSO job scheduling results must be in accordance with load balancing. The experimental results on Hadoop platform show that through LBMFPSO scheduling results can make job completion time shorter, and the utilization of system resources is higher. Both load balancing of job execution time and resources are considered.
Keywords/Search Tags:Hadoop, particle swarm optimization, load balance, LBMFPSO, job scheduling
PDF Full Text Request
Related items