Font Size: a A A

Research On Scheduling Method In Data Localization Of MapReduce

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:M X TangFull Text:PDF
GTID:2348330509960889Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, cloud computing has become the hottest IT technologies of present. Currently, cloud computing technology is the cornerstone of IT companies, whether it is the computer Internet, or smart phones,GPS and other mobile terminals, the development trend of which is related to cloud computing. The emergence of MapReduce brings the development of cloud computing into a new stage, and has a profound impact on academia and industry.MapReduce is a computational framework for large-scal data process. Data localization is an important principle for the design of MapReduce and an important objective of MapReduce task scheduling. In order to improve the performance of the system, this paper mainly study of MapReduce scheduling strategy for the purpose of data localization.This paper studies the task scheduling mechanism in MapReduce, describes the limitations of non-local tasks selection algorithm of MapReduce, and proposes a task scheduling method based on node load. Based on the evaluation of node load, task scheduling can be dynamic executed in this paper, and in this way load balance can be achieved, which can reduce data migration and accelerate the operation of job. Experimental results show that this method can significantly reduce the number of non-local tasks, thereby reducing the amount of data migration, and improving system performance.This paper does some research on the task execution process of MapReduce, analyses the reason why non-local tasks reduce system efficiency and proposes a data prefetching technology based on overlapped task scheduling. Through the introduction of ‘pre-scheduling' state for non-local tasks, two phases in task execution, data processing and data fetching, can be overlapped in this paper, so remote data access can be hided, the execution of non-local task can be shortened and the resource utilization can be improved. Experimental results show that this technology can significantly recuce the execution time of non-local tasks, improve resource utilization of the system.This paper studies the fault-tolerant mechanisms of MapReduce and the process of job initialization, generates copies of tasks according to copies of data in MapReduce system using redundancy mechanism of HDFS when a job is initialized, and proposes a new scheduling mechanism based on copies of tasks. The process progress of each node is described in this paper, so the node which has a quick progress can get a higher right to use copies of tasks, and a good data locality can be achieved. Experimental results show that this mechanism can reduce the number of non-local tasks, improve task execution speed.
Keywords/Search Tags:MapReduce, Load, Prefetching, Duplicate
PDF Full Text Request
Related items