Research Of Hadoop Job Scheduling Algorithm In Big Data

Posted on:2016-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:P H Liu

Full Text:PDF

GTID:2308330461991706

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Along with the rise of the mobile Internet, Cloud Computing, Internet of Things and other new technologies, the characters like size, type, speed and value of data have reached an unprecedented degree in a short time. Traditional data have finished qualitative change, and big data era is formally coming. Hadoop, as the first choice storing and processing for huge amounts of data, is an open source platform where distributed processing of huge amounts of data can be supported. It has so many advantages like high efficiency, high reliability, high scalability, and fault tolerance that it has attracted so much attention in industry and academy. A new generation of Hadoop has established resource management system YARN. YARN, as the core module of Hadoop2.0, is mainly used for resource management and scheduling for all kinds of applications. And the pros and cons of job scheduling algorithm are directly related to the overall performance of the Hadoop platform and system resource utilization. So the study of job scheduling algorithm has important significance.Load balancing is very important in cluster system. How to reasonably distribute calculating resources and in the platform and balance resource load is an important problem that need to be solved Hadoop cluster. Existing job scheduling algorithm, that pursues shortest operation completion time without considering the node load ability, is designed based on the homogeneous environment, which is easy to cause the cluster nodes in a heterogeneous environment load imbalance. It could easily lead to multiple jobs competing advantage resources, which caused the situation that some computational resources in heavy Load while others in idle computing resources and seriously degraded the performance of the platform. In view of the cluster load imbalance phenomenon, the Load Balancing Measure Function(LBMF) was proposed. For Hadoop load imbalance problem in a heterogeneous environment, Load Balance Measure Function Particle Swarm Optimization(LBMFPSO) was proposed. In LBMFPSO, the deliverable is treated as a particle, and the location of the node is the search space. LBMF and job execution time are both used as fitness function to direct the particles distribution, and particles finding the optimal solution is the process of job scheduling. And LBMF<5% as a termination criterion of LBMFPSO job scheduling results must be in accordance with load balancing. The experimental results on Hadoop platform show that through LBMFPSO scheduling results can make job completion time shorter, and the utilization of system resources is higher. Both load balancing of job execution time and resources are considered.

Keywords/Search Tags:

Hadoop, particle swarm optimization, load balance, LBMFPSO, job scheduling

PDF Full Text Request

Related items

1	Task Scheduling Optimization Based On Time And Load Balance Under The Hadoop Platform
2	Research On The Load Balance Strategies Of Cloud Computing Federation Based On The Particle Swarm Optimization Algorithm
3	Study On Particle Swarm Optimization Algorithm For Multiple Application Scenarios
4	Research On CPS Task Scheduling Algorithm Based On Improved Particle Swarm
5	CH Smart TV Assembly Line Balance Based On Improved Particle Swarm Optimization
6	Research And Improvement Of Job Scheduling Algorithms On Hadoop Platform
7	Research And Implementation Of LVS Cluster Load Balancing Scheduling Algorithm Based On PSO-GA
8	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop
9	Design And Implementation Of Load Scheduling Method For Programmable Wireless Access Points
10	Research On Dynamic Job Scheduling Based On Hadoop Heterogeneous Cluster