Font Size: a A A

Research On Scheduling Strategy Based On Hadoop

Posted on:2017-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YanFull Text:PDF
GTID:2348330533950190Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data, the traditional capacity of computing and storaging has been unable to meet the growing demand. In this case, Cloud Computing technology emerging. Wherein, Hadoop is an open source version derived from Google's Cloud Computing technology, and has became a top-level project of the Apache, providing the backbone for the era of big data. However, with the development of the Hadoop technology, cluster scale expanding quickly, and the cluster resources(network, storage and other resources) is becoming a system bottleneck of Hadoop. Researching on task scheduling is a way from the perspective of resource allocation and management to improve performance of Hadoop system.This thesis, combining with data-locality and SDN, improved scheduling strategies for ReduceTask and LATE's speculative task.1. A ReduceTask scheduling strategy that based on data-locality is improved. In MapReduce stage, there are two main data streams in cluster network, they are slow task migration and remote copies of data. The two overlapping burst data transfer can easily become bottlenecks of the cluster network. To reduce the amount of remote copies of data, combining with data-locality, we establish a minimum network resource consumption model(MNRC). MNRC is used to calculate the network resources consumption of ReduceTask. Based on this model, we design a delay priority scheduling policy for the ReduceTask which is based on the cost of network resource consumption. Finally, MNRC is verified by simulation experiments. Evaluation results show that MNRC outperforms the saving cluster network resource by an average of 7.5% in heterogeneous.2. A speculative task scheduling strategy that based on SDN technology is improved. For LATE mechanism, some slow tasks are slower than speculative tasks. This is not only unable to reduce task turnaround time and a waste of system resources. In this article, we join the slow task compared with the speculative task for the speculative task scheduling strategy of LATE. Wherein, the run time of speculative tasks contains the input data transfer time, real-time bandwidth corresponding to a bandwidth of the link. Based on this model, proposed a bandwidth-aware speculative task run time estimation model(BWRE) based on SDN, using this model to accurately speculative the backup task run time. And we use SDN to provide bandwidth guarantees for the speculative task. Finally, BWRE is verified by simulation experiments. Evaluation results show that BWRE outperforms the shortening job turnaround time by an average of 9.85%.
Keywords/Search Tags:Hadoop, task scheduling, data-locality, SDN, LATE
PDF Full Text Request
Related items