Research On Scheduling Strategy Based On Hadoop

Posted on:2017-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Yan

Full Text:PDF

GTID:2348330533950190

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the era of big data, the traditional capacity of computing and storaging has been unable to meet the growing demand. In this case, Cloud Computing technology emerging. Wherein, Hadoop is an open source version derived from Google's Cloud Computing technology, and has became a top-level project of the Apache, providing the backbone for the era of big data. However, with the development of the Hadoop technology, cluster scale expanding quickly, and the cluster resources(network, storage and other resources) is becoming a system bottleneck of Hadoop. Researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system.This thesis, combining with data-locality and SDN, improved scheduling strategies for ReduceTask and LATE's speculative task.1. A ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model(MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.2. A speculative task scheduling strategy that based on SDN technology is improved. For LATE mechanism, some slow tasks are slower than speculative tasks. This is not only unable to reduce task turnaround time and a waste of system resources. In this article, we join the slow task compared with the speculative task for the speculative task scheduling strategy of LATE. Wherein, the run time of speculative tasks contains the input data transfer time, real-time bandwidth corresponding to a bandwidth of the link. Based on this model, proposed a bandwidth-aware speculative task run time estimation model(BWRE) based on SDN, using this model to accurately speculative the backup task run time. And we use SDN to provide bandwidth guarantees for the speculative task. Finally, BWRE is verified by simulation experiments. Evaluation results show that BWRE outperforms the shortening job turnaround time by an average of 9.85%.

Keywords/Search Tags:

Hadoop, task scheduling, data-locality, SDN, LATE

PDF Full Text Request

Related items

1	Research On Data Locality Of Hadoop Task Scheduling
2	Data Locality-awared Task Scheduler For Hadoop
3	Research And Improvement Of Task Scheduling Algorithm In Hadoop
4	The Research On Distributed Task Scheduling Algorithms Based On Hadoop Platform
5	Hadoop Job Scheduling Research And Optimization About Data Locality
6	The Research On High Performance Task Scheduling Technology Based On Mapreduce In Cloud Computing
7	Hadoop Task Scheduling Algorithm Optimization About Data Locality
8	Research On Cloud Task Scheduling Algorithms Based On Mapreduce
9	Task Scheduling Research And Application Of Big Data In Distributed Environment
10	Research On Task Scheduling Algorithms Based On Pre-Release Resource List In Hadoop