Font Size: a A A

Performance Optimization And Applications Of MapReduce In Cloud Computing

Posted on:2012-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:X X ChenFull Text:PDF
GTID:2178330338497896Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Since 2007, cloud Computing has become a relative hot concept in the international IT industry. While from 2008, Cloud Computing in china also has developed rapidly. With the sharply increase of data volume, how to fast and efficiently storage and computing mass data has become a problem that needs urgent solution. And such question is just one of the motives that Cloud Computing was proposed. This makes the Popularization and application of Cloud Computing to be an unavoidable and irreversible trend. As for Cloud Computing itself, it just a kind of thinking model. If one tries to really play the advantage of Cloud Computing, there must have a programming model to support and realize the idea of Cloud Computing.MapReduce is a parallel programming model raised by Google. MapReduce provides a software support for the procession of massive data. It makes the execution of parallel become easy and practice by providing some simple and powerful interfaces. In this paper, we have analyzed the conception, advantage and realization mechanism of Google's MapReduce and GFS. Then it is pointed that in MapReduce workflow, the process mechanism of intermediate key-values is inflexible and the intermediate results have not been decreased at the first time. Focusing on this shortage, we introduce associative array to map function. To make the merge operation of intermediate results executed automatically. Through this method, the intermediate key-values and the network bandwidth are reduced efficiently.In this paper, we design and realize a text classifier based on the improved MapReduce. In the field of text processing and data mining, massive data classification problem is often encountered. While the traditional algorithm just can adapt to small-scale data. With the increasing of data, the execution speed of algorithm is getting slower and slower and finally became the bottleneck of data mining. This classifier is parallel realized in the cluster. It greatly improves the execution efficiency and makes the algorithm have better real-time.In order to verify the performance of the improved MapReduce, we conducted experiments and compared the running time of different algorithms. The experiment platform is Hadoop, which is an open source implementation of MapReduce. The result shows that the new algorithm is more efficient than conventional algorithms. The classifier is also realized on the Hadoop platform. Through the compare of different methods, we can see that the classifier based on MapReduce has better efficiency and scalability.
Keywords/Search Tags:Cloud Computing, Massive Data, MapReduce, Programming Model, Classifier
PDF Full Text Request
Related items