Font Size: a A A

Scheduling Optimization Research For MapReduce

Posted on:2017-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:S L GaoFull Text:PDF
GTID:2308330485981022Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Big data era comes with the Internet era. The efficient processing of large-scale data is of great significance to the production and practice and traditional computing model is unable to meet the requirements of mass data processing. Hadoop, composed with HDFS and MapReduce, was arising with the design ideas of GFS and MapReduce published by Google in 2004, which has been widely used in mass data processing. As the core component, MapReduce is responsible for distributed processing and the performance of it becomes a hot issue. Through a large number of researches and experimental results,we find that data locality is an important factor affecting the performance of MapReduce,which also affects the network bandwidth consumption of the cluster and the execution efficiency of the nodes. This thesis takes data locality as the entry point to enhance the execution efficiency.Data locality means computing happened on where data source located. This thesis deeply analyzes the scheduling mechanism of MapReduce and finds that the roughness rule of MapReduce selecting data blocks and computing nodes makes its low degree of data locality. This thesis highly abstracts task scheduling and resource distribution,proposes two efficient task scheduling algorithms Bolas and Bolas+ to solve the problem,meanwhile enhance the job execution efficiency.This thesis does a good innovation in MapReduce scheduling optimization. Bolas novelly abstracts task scheduling as weighted optimal bipartite graph matching, and creatively solves the mismatching problem of computing nodes and data blocks. Bolas+proposes innovative lightweight scheduling strategy based on data block marking. In Bolas+, data blocks and nodes are considered more refined while scheduling.Through a large number of experiments show that Bolas can elevate the data locality to a degree of 100%, and Bolas can reduce the total job execution time by up to 15%. The average data localization degree of Bolas+ can be maintained more than 95%, as the job becomes larger, localization tends to 100%, job execution efficiency averagely enhances15%. The computation complexity of Bolas is O(n3), when job becomes larger, the response time will be a burden. However, Bolas+ can solve this problem gracefully with a computation complexity of O(n/m), where n represents block number and m denotes the computing node number.
Keywords/Search Tags:MapReduce, Data-locality, Task Scheduling, Bipartite graph matching, Data block marking
PDF Full Text Request
Related items