Font Size: a A A

Hadoop Job Scheduling Research And Optimization About Data Locality

Posted on:2016-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:R F ChenFull Text:PDF
GTID:2308330467472494Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human being has entered the era of service from that of equipment. Now, the cloud is becoming the focus of IT field. Hadoop, a distributed computing open source framework in Apache open-source organization, is inspired by the Map Reduce and Google File System developed by the Google Lab initially. It is also a software frame that processes a large number of data in a reliable, highly effective, retractable and distributed way. Hadoop is made up of Distributed File System and Map Reduce that is a distributed computing frame. MapReduce is founded on the Distributed File System, computing with data which is stored at the Distributed File System. On top of this, job scheduling algorithm plays an important role in the performance of hadoop job executing. To reduce the cost on network transmission during the mission, the task is scheduled to the nodes that store the input data. That is to say, the task is computed locally.All those three native hadoop task scheduling algorithms enjoy the same task selection policy which selects the local task firstly. Sometimes Hadoop would select a non-local task, for example, when there is an idle node while others are all busy. Besides, Hadoop won’t consider task locality for the failed maps. To solve the problems in terms of data locality in native Hadoop data scheduling algorithm, this paper presents an improved idea based on the way of inter-block resource prefetching. Before non-local tasks executes, data for data block is prefetched when other task is running. This ensures that Hadoop tasks are scheduled locally.By setting up a small Hadoop experimental platform, this thesis compares Hadoop native scheduler algorithms with the improved job scheduler in the experiments. The experiments indicate that the improved algorithm can heighten the data locality and to some extent reduce the job execution time.
Keywords/Search Tags:Hadoop, Cloud Computing, Locality, Job Scheduling, Failed Map, DataPrefetch
PDF Full Text Request
Related items