Performance Optimization And Applications Of MapReduce In Cloud Computing

Posted on:2012-07-20

Degree:Master

Type:Thesis

Country:China

Candidate:X X Chen

Full Text:PDF

GTID:2178330338497896

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Since 2007, cloud Computing has become a relative hot concept in the international IT industry. While from 2008, Cloud Computing in china also has developed rapidly. With the sharply increase of data volume, how to fast and efficiently storage and computing mass data has become a problem that needs urgent solution. And such question is just one of the motives that Cloud Computing was proposed. This makes the Popularization and application of Cloud Computing to be an unavoidable and irreversible trend. As for Cloud Computing itself, it just a kind of thinking model. If one tries to really play the advantage of Cloud Computing, there must have a programming model to support and realize the idea of Cloud Computing.MapReduce is a parallel programming model raised by Google. MapReduce provides a software support for the procession of massive data. It makes the execution of parallel become easy and practice by providing some simple and powerful interfaces. In this paper, we have analyzed the conception, advantage and realization mechanism of Google's MapReduce and GFS. Then it is pointed that in MapReduce workflow, the process mechanism of intermediate key-values is inflexible and the intermediate results have not been decreased at the first time. Focusing on this shortage, we introduce associative array to map function. To make the merge operation of intermediate results executed automatically. Through this method, the intermediate key-values and the network bandwidth are reduced efficiently.In this paper, we design and realize a text classifier based on the improved MapReduce. In the field of text processing and data mining, massive data classification problem is often encountered. While the traditional algorithm just can adapt to small-scale data. With the increasing of data, the execution speed of algorithm is getting slower and slower and finally became the bottleneck of data mining. This classifier is parallel realized in the cluster. It greatly improves the execution efficiency and makes the algorithm have better real-time.In order to verify the performance of the improved MapReduce, we conducted experiments and compared the running time of different algorithms. The experiment platform is Hadoop, which is an open source implementation of MapReduce. The result shows that the new algorithm is more efficient than conventional algorithms. The classifier is also realized on the Hadoop platform. Through the compare of different methods, we can see that the classifier based on MapReduce has better efficiency and scalability.

Keywords/Search Tags:

Cloud Computing, Massive Data, MapReduce, Programming Model, Classifier

PDF Full Text Request

Related items

1	Researches And Application Of Mapreduce Parallel Programming Model For Cloud Computing
2	Research On MapReduce Parallel Programming Model In The Cloud Computing
3	The Research And Implementation Of Diversity Demand Oriented Parallel Computing Model
4	The Research And Implementation Of Comprehensive Mapreduce
5	The Desgin And Implementation Of A MAPREDUCE Based Distribute Programming Framework
6	Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data
7	The Research Of Parallel Clustering Algorithm Of Massive Data In Cloud Computing Environment
8	The Process And Research Of Massive Data Mining Based On Cloud Computing
9	Research On Parallel Skyline Algorithms And Their Applications In Cloud Computing Environment
10	Probabilistic Graphical Models For Data-intensive Computing Construction Method And Implementation