Study On Computing Task Scheduling Optimization Based On Hadoop Job

Posted on:2017-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:X Xiao

Full Text:PDF

GTID:2308330485488111

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the information science, the internet is becoming increasingly connected with our society in all aspects of life and the information data generated within it grows at an exponential rate. In the face of these massive data, the traditional computing model could hardly meet the current data processing requirements. As an intersectional product of the traditional computing model and the network technology, the cloud computing could deal with these massive data efficiently in lots of distributed computing machines. Hadoop, as a distributed computing framework which could deal with this kind of large-scaled datasets in high-efficiency, has been adopted more and more frequently by institutions as the basic computing framework of the cloud computing platform. As the improvement of Hadoop’s execution efficiency has become a hot research topic, therefore the amelioration of the scheduling algorithm, a critical factor effecting the execution efficiency, is of high significance.Based on the existing optimization examples of Hadoop’s scheduling algorithm, it is not hard to find that most optimization algorithms focus on how to carry out the reasonable scheduling between multiple jobs, few works are about the computing task scheduling based on Hadoop job. In addition, the computing capacity of heterogeneous cluster nodes is not fully considered, or only according to the machine configuration to set a theoretical value, therefore it become disconnect with reality. This thesis is mainly aimed at the research of the problem of computing task scheduling based on Hadoop job, of which the main work includes the following two parts.First, we make an introduction to the background knowledge of this subject and the Hadoop components involved in the process of scheduling. Besides, we analyse the disadvantages of the default task scheduling algorithm of Hadop and function of the related classes and methods in the task scheduling process. This thesis proposes a data localization task scheduling algorithm based on Hadoop after analyzing the main idea, design thought, advantages and disadvantages of some improved scheduling algorithm at the present stage. The algorithm can help us calculate the saturation level of the node’s data localization and schedule the computing task according to the real computing performance of the node and the number of currently-stored data blocks which haven’t been processed. In the process of traditional task scheduling, data blocks stored in node have no distinction and randomly selected one data block at each schedule. In this thesis, we introduces the concept of data block label, and marks every data block in scheduling process, and then schedules the block according to the value of the label. The algorithm proposed in the thesis can improve the efficiency of computing task scheduling based on Hadoop job. In addition, combined with other multiple job scheduling algorithms, it can further improve the efficiency of Hadoop platform combined with other multiple job scheduling algorithms, and it can shows a good performance even in the heterogeneous cluster.Second,we take an experiment of the optimal task scheduling algorithm and the default task scheduling algorithm by building a Hadoop heterogeneous cluster as the experiment environment, then compares and analyses the experimental results. It shows that the optimal algorithm can increase the number of data localization computing task. Thus, it can reduce the network bandwidth usage, use the resource of system more effectively and cut down the whole job running time.

Keywords/Search Tags:

Hadoop, data localization, task scheduling, heterogeneous cluster

PDF Full Text Request

Related items

1	Optimization And Research Of Hadoop Scheduling Algorithm In Hadoop Heterogeneous Environment
2	Research And Improvement Of Job Scheduling Algorithm Based On Hadoop Cluster
3	Research On Hadoop Cluster Scheduling Optimization
4	Research On Scheduling Optimization In Heterogeneous Hadoop Clusters Based On Dynamically Adjusting Node Resource
5	Research On Task Scheduling Algorithm In Heterogeneous Cloud Environment
6	Research And Improvement Of Task Scheduling Algorithm In Hadoop
7	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
8	SLA-based Adaptive Job Scheduling In Heterogeneous Hadoop Clusters
9	Rcscarch On Construction Of ETL Cluster Model Based On Task Scheduling
10	Research Of Task Scheduling Strategy For Heterogeneous Cluster In Spark Computing Environment