Font Size: a A A

Mapreduce Simulation And The Research Of Fair Share Scheduling Algorithm In Hadoop

Posted on:2014-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LiFull Text:PDF
GTID:2268330425466341Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, with the development of Distributed Computing and Grid computing, CloudComputing has been a research hotspot in the field of IT technology, including distributedcluster, virtualization technology, user service level and so on. Hadoop is a heavy weight class,distributed, and open-source framework. It has been used as the core cloud computingplatform by many companies.This paper focuses on the the Hadoop’s implementation framework, Hadoop’s clusterstructure in a virtual machine and Hadoop’s scheduling strategy. The important is the researchand implementation of fair share scheduling. User-oriented is one of the key characters incloud computing. In order to ensure the users’ satisfy, an important question has raised thathow to fairly divide resources between different users. Besides, in a cluster, the chance ofnode failure is high, at least for large clusters. So the failure of nodes undoubtly affects theperformance of service.In this article, we introduce the key technologies in Cloud Computing, including Hadoopand virtualization technology, especially the MapReduce programming model and threeScheduling algorithms in Hadoop. This article mainly discusses three studies on Hadoop.Firstly, we developed a simulated MapReduce model in C++programming language, then wedeployed Hadoop in a virtualized cluster of nine nodes. With this cluster, we validated theperformance of MapReduce job under different scheduling algorithm, FIFO SchedulingAlgorithm and Fair Share Scheduling Algorithm. We designed four groups of experiments inwhich MapReduce jobs run in four different schemes. The conclusion is showed in fouraspects. Firstly, FIFO scheduling is better than fair share scheduling when there is only onesingle job. Secondly, fair share scheduling can fairly share cluster resources between differentjobs when there are many jobs simultaneous. Thirdly, the relative optimal solution of delaytime in the improved fair share scheduling is1.5times of heartbeat time. Finally, single nodefailure seriously affects the performance of jobs.
Keywords/Search Tags:Cloud Computing, Hadoop, Fair Share Scheduling, Virtualizationtechnology
PDF Full Text Request
Related items