Font Size: a A A

The Design And Implement Of HDFS Based Cloud Dta Backup System

Posted on:2012-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y DuFull Text:PDF
GTID:2178330332999922Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a data security strategy, backup is the last and fundamental way to avoid data missing. The coming of cloud storage technology has provided new thinking for data backup. It has the following characteristics suitable for data backup: complete data storage service for users to get intellectual backup software and well-managed storage capacity; only data backup, without worrying about the control on former data; advantageous price to backup the same scale of data, which is far cheaper than building up data center by purchasing storage device.Based on cloud storage software HDFS, this paper has designed a HDFS Based Data Backup System, which is to meet users'needs of data backup/ restore by making the most of cloud storage technology, as well as applying existed cheap computers to set up a data backup cluster.This system can be divided into clients, backup server and HDFS cluster.Clients are the nodes for users to backup or restore data, which can be divided into several clusters according to region, bandwidth etc. They ask for the backup server who is in charge of this cluster when data needs backup or restore, which can only be done after the permission.Backup server is the bridge between clients and HDFS cluster's data backup and restore, which consists of several servers with high performance and large storage, each taking charge of a client cluster. They accept the backup restore request from clients, perform user identification and store the data from clients temporarily. To meet the feature that HDFS is suitable for large file's storage, backup server has combined small files to upload by setting a upload threshold value, thus enhancing system performance. It also preserves a backup file mapping table for clients. When client asks for data restore, it will restore backup files from HDFS clusters and send to them according to the file mapping table.HDFS cluster consist of computers installed HDFS, providing servers with backup and restore service to realize the kernel function of the system. HDFS cluster are combined with a Namenode and certain amount of Datanodes. The former takes charge of system name space, as well as the mapping from data bloc to specific Datanode; while the latter is used for data storage and can be dynamically expanded according to the scale of backup data, consisting of large amount of cheap computers inside the enterprise.HDFS Based Data Backup System designed in this paper has the strengths in security, expansibility, economy, and reliability:Security: through user identification, authorization and restricting access to systems, backup server can guarantee the security between clients and himself. Through the security mechanism of Hadoop, the safety of communication and data transmission between HDFS cluster and backups server is assurance. Expansibility: by use of powerful storage and calculation expansibility of Hadoop, the scale of HDFS cluster can be expanded at any time to enhance backup capacity of the system.Economy: HDFS is the distributed file system designed for cheap hardware, with good compatibility so that any computer can enter into this backup clusters by installing HDFS. Therefore, large amount of unoccupied computer resources can be used to cut down the expenses of procuring devices.Reliability: backup files in HDFS cluster are mainly preserved by copies. And also the number of copies can be increased according to importance of files to enhance the reliability.Cloud storage technology is a rising technology. Our next target is to study how we can make this technology play a more important role in data backup.
Keywords/Search Tags:cloud computing, cloud storage, backup system, HDFS
PDF Full Text Request
Related items