Hadoop Job Scheduling Research And Optimization About Data Locality

Posted on:2016-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:R F Chen

Full Text:PDF

GTID:2308330467472494

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Human being has entered the era of service from that of equipment. Now, the cloud is becoming the focus of IT field. Hadoop, a distributed computing open source framework in Apache open-source organization, is inspired by the Map Reduce and Google File System developed by the Google Lab initially. It is also a software frame that processes a large number of data in a reliable, highly effective, retractable and distributed way. Hadoop is made up of Distributed File System and Map Reduce that is a distributed computing frame. MapReduce is founded on the Distributed File System, computing with data which is stored at the Distributed File System. On top of this, job scheduling algorithm plays an important role in the performance of hadoop job executing. To reduce the cost on network transmission during the mission, the task is scheduled to the nodes that store the input data. That is to say, the task is computed locally.All those three native hadoop task scheduling algorithms enjoy the same task selection policy which selects the local task firstly. Sometimes Hadoop would select a non-local task, for example, when there is an idle node while others are all busy. Besides, Hadoop won’t consider task locality for the failed maps. To solve the problems in terms of data locality in native Hadoop data scheduling algorithm, this paper presents an improved idea based on the way of inter-block resource prefetching. Before non-local tasks executes, data for data block is prefetched when other task is running. This ensures that Hadoop tasks are scheduled locally.By setting up a small Hadoop experimental platform, this thesis compares Hadoop native scheduler algorithms with the improved job scheduler in the experiments. The experiments indicate that the improved algorithm can heighten the data locality and to some extent reduce the job execution time.

Keywords/Search Tags:

Hadoop, Cloud Computing, Locality, Job Scheduling, Failed Map, DataPrefetch

PDF Full Text Request

Related items

1	Research On Cloud Task Scheduling Algorithms Based On Mapreduce
2	Research Of Localization Computing Strategy Based On Hadoop Platform
3	The Research On High Performance Task Scheduling Technology Based On Mapreduce In Cloud Computing
4	Research Of The Job Scheduling Algorithm On Hadoop Cloud Platform
5	Research Of Cloud Logistics Scheduling System Based On Hadoop
6	Study On Hadoop Resource Scheduling Strategy Based On IaaS Cloud Platform
7	Research Of Job Scheduling On Cloud Computing
8	Research On Job Scheduling Algorithm For Forest Resource Information In Cloud Computing
9	Research On Job Scheduling Method Under Hadoop Platform
10	The Research And Implementation Of Hadoop Scheduling Algorithm