Font Size: a A A

Research And Optimzation Of Availability For Mapreduce In Cloud Computing

Posted on:2012-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y K ZhouFull Text:PDF
GTID:2178330338484232Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cloud Computing has gained significant traction in recent years. The businesses today are facing tremendous challenges due to the complex applications and dramatic growth in data volumes, usually data size can reach TB even PB level. How to deal with such a large-scale data is one of the major problems which Cloud Computing will face. Because the amount of data is very big, single machine could not satisfy the mass data processing performance and reliability requirements. Therefore, dealing with mass data in distributed data system is currently the main challenge of Cloud Computing. Current computing model already cannot satisfy the data processing in Cloud Computing environment. Under this background, the MapReduce programming model arises at the historic moment. However, MapReduce programming model is not perfect. Most scholars are trying their best to deal with its algorithm efficiency, internal algorithm, integration with existing system or combining with existing methodologies. Rare of them improve the Architecture of MapReduce.In this paper, the Google Cloud Computing platform is studied. The MapReduce and Google File System in the platform are researched. This paper studies and practices Hadoop, as an open source platform for Cloud Computing, which has realized MapReduce (named Hadoop MapReduce) and GFS (named HDFS). A small cluster has been built to deal with the large-scale data. And then, a problem which is about SPOF (Single Point of Failure) has been found. To solve this problem, in this paper, a resolution is using the Hierarchical Master-Worker Architecture with metadata replication in master node. This solution can take off the SPOF in system and decrease the workload on master node. A new small cluster has been built based on the research in this paper. After system test, data collect and data compare. The conclusion of this paper is that despite of a decrease on execution efficiency, the solution of this paper can make the system have a good availability and decrease the workload of master node. Because of this, this solution is usable. This paper firstly reviews Cloud Computing Technology at home and abroad, analyzes the Architecture of Cloud Computing application in big company. Research and analyze the MapReduce programming model and the Google File System which are the most important technology in Cloud Computing. Make experiment and summary base on the research work. Assert using the Hierarchical Master-Worker Architecture with metadata replication in master node to deal with the SPOF in current application system. This solution can also decrease the workload on master node. After system test, data collection and data analysis, the collusion has been given at last.
Keywords/Search Tags:Cloud Computing, MapReduce, Master-Worker Architecture, Metadata Replication, Availability
PDF Full Text Request
Related items