Research And Improvement Of The MapReduce Framework In Cloud Computing

Posted on:2014-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:C J Wang

Full Text:PDF

GTID:2268330425472448

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, it changes to a serious problem that how to efficiently process large amouts of data when information growth rapidly. The traditional way to handle large data is costly. However, the Hadoop platform, birthed in the cloud computing revolution, could easily cope with huge amouts of data, structured or unstructured, and parallel process massively. Hadoop brings a more convenient, cheep, fast and safe way for massive data processing. Therefore, researching on how to improve the stability of the framework and optimize the system performance has more and more significance.In this thesis, we started from the Hadoop framework structure, made the following researches on the load balancing of nodes and task scheduling optimization problem.Firstly, we analysed the mapping process of the Map task intermediate results, point out the problem of data skew, and then put forword two methods of balancing data mapping:the fair load online model and the fair load offline model. The online model needed to be pre-analysed of the distribution of keys, while the offline model needed to measure the performance of the task slots. Then we proposed a method of measure node performance.Secondly, analyse the data locality issue, point out the importance of data locality and the affection with heterogeneous environment. Study the task scheduling and analyse three exising scheduling algorithm, propose a node delay matching scheduling algorithm to improve the data locality matching degree.Finally, we built the Hadoop distribute environment, and experimented in the cluster to compare the new method whith load balancing and node delay scheduling to the original scheduling algorithm. And, the experiments showed that the new improvement had better data locality and response time for most types of jobs.In this thesis, we studied the intermediate data mapping and task scheduling on the Hadoop platform, analysed the defects and performance bottlenecks of the framework. We proposed some improved algorithms and experimented on the cluster, and provide new ideas and methods for the optimization and upgrading of Hadoop platform performance.

Keywords/Search Tags:

cloud computing, Hadoop, MapReduce, delay schedule

PDF Full Text Request

Related items

1	Research And Improvement Of The MapReduce Framework In Cloud Computing
2	Topology Design And Hadoop Research In Cloud Computing
3	Researches About Cloud Computing And Expolit And Test Hadoop Program
4	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
5	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
6	The Design Of The Cloud Computing System Based On Hadoop
7	The Cloud Computing Based On Hadoop Platform And Log Analysis
8	Research On The Application Of Cloud Computing Based On Hadoop
9	Optimization And Application Research Of MapReduce Computing Model Based On Hadoop
10	Design And Implementation Of Visual Data Platform Based On MapReduce