Font Size: a A A

Reseach Of Hadoop Cluster Based On Eucalyptus Cloud Platform

Posted on:2012-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:G L XieFull Text:PDF
GTID:2218330338467700Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud Computing is a research focus at demostic and abroad. It is a development of parallel computing, grid computing and distributed computing and is a new business computing model. Many large companies have invested in this field of research, such as Google, IBM, Microsoft and so on. Cloud computing integrate the super-large-scale computing and storage resources to form a virtual pool of computing resources through the Internet, shaped the service type and available to users on demand, so people can easily access through the network computing power, storage capacity and infrastructure. Cloud Computing can be an effective solution to massive data analysis and processing problems, and provides reliable, scalable data-processing storage center, in reducing the terminal equipment required while improving the processing of data. Therefore, complex calculations which consumed a large amount of computing resources, such as massive data processing, is calculated on the way of multi-node distributed through the network becoming a new and effective solutions.Hadoop, the open source distributed computing system of Apache organization, provides valuable experience to the concrete realization of cloud computing, and has been applied in Amazon, Facebook, Baidu, Yahoo and so on. The core design of Hadoop framework is MapReduce and HDFS. The concept of MapReduce framework is "task decomposition and a summary of results", that is, first split a task into multiple subtasks, and then handle the task scheduler to run on multiple cluster nodes. The final results will be pooled and then summarized. HDFS is the acronym for Hadoop distributed file system, which provides the underlying support for distributed computing storage.This thesis studied the relevant theory, features, advantages and key technology of cloud computing. And studied the eucalyptus open-source cloud-computing system, explored the operational mechanism and principles of the Hadoop open-source distributed framework. On the basis of which, the thesis studied the MapReduce programming model and its programming method, and the establishment of Hive which is a data warehouse based on the Hadoop infrastructure. Then it further proposed MapReduce-based and Hive-based programming model to handle the massive data-level log files. The thesis also used the open source cloud platform, Eucalyptus, to build a private laboratory clouds, and a distributed computing platform with Hadoop, then integrated the Hadoop cluster into the Eucalyptus private cloud. This thesis also studied in-depth of the dynamic expansion method of Hadoop cluster, including increasing the Hadoop cluster node and balancing the system load after increasing the cluster, and removing the node from the cluster. In addition, this thesis proposed heartbeat detection strategies and load balancing strategy, on the basis of which proposed an elastic stretching system solution of Hadoop cluster based on Eucalyptus cloud according to some strategies, such as heart rate detection and load balancing strategy for scalable and flexible without human intervention, and implemented the whole prototype system of the scalable framework. Finally, the thesis pointed out that open source software, Eucalyptus, Ganglia and Hadoop can be combined into a commercial product.
Keywords/Search Tags:Cloud Computing, Eucalyptus, Hadoop, MapReduce, Elastic Stretching
PDF Full Text Request
Related items