Font Size: a A A

The Design And Implementation Of Deployment And Management System For Hadoop Cluster

Posted on:2014-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2248330395495871Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Applications with big data requirements, such as Internet applications, scientific data processing and business intelligence data analysis, are becoming more and more common nowadays. Hadoop, the open source distributed file system and parallel computing model, has been widely deployed and applied. However, it’s not easy to deploy and manage a Hadoop cluster. This is mainly due to the mass configuration parameters of Hadoop related systems and the cluster scale with hundreds and thousands of servers.This paper makes a research about Hadoop related systems and compares several Hadoop cluster deployment and management schemes. After that, it designs and implements HDMS, the deployment and management system for hadoop cluster. The system is designed to automatically deploy Hadoop related systems, manage roles of nodes, change Hadoop configurations, start or stop system services and monitor the system running state. This paper’s work is mainly manifested in the following four aspects.(1) The design and implementation of the configuration interface. Extract the Hadoop cluster configuration items and abstract them to be parameterized interface. And concentrate them in the management node of HDMS. It provides intuitive and convenient method of cluster configuration for the upper application system.(2) The design and implementation of the cluster deployment module. It deploy essential environment for the cluster by remote command execution, which includes deploying Internet environment, software repository, time consistency service and puppet. It provides a unified cluster deployment interface for the upper application system.(3) The design and implementation of the Hadoop components module. This module uses Puppet resource description language to manage cluster resources, which include the software packages of Hadoop components, the configuration files and the services. It also design and implement the memory allocation algorithm of Hadoop services and the computation of Mapreduce task slots.(4) The design and implementation of the security module and monitoring module. It designs the deployment and management of the Kerberos authentication system and the Ganglia cluster monitoring system. It’ll automatically configure related parameters to be integrated with Hadoop, which will provides security assurance and monitoring method.
Keywords/Search Tags:Big data processing, Hadoop, Cluster deployment, Puppet
PDF Full Text Request
Related items