Font Size: a A A

Research On Distributed Data Full Backup And Incremental Backup Of File System

Posted on:2010-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ShangFull Text:PDF
GTID:2178360272996269Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Accompanied by the continuous maturity of computer hardware technology, the incessant updating of software and the rapid development of internet, the information age has already arrived. Whether individuals, companies or enterprises, it is unnecessary for human manual management to deal with the important data information. The tedious complex work can be completed by the computer system. However, in practice, the working environment is extremely complicated. Sometimes maybe because of the attack of computer viruses, disk damage accident, man-made misoperation or other unpredictable situations, the partial or total data file will be damaged and thus huge losses are caused.Judging from the present situation, to make regular backups of critical data is the only viable solution. It can probably avoid the unnecessary trouble caused by data loss, and reduce the extent of the loss. In addition, in recent years, the distributed application becomes more and more wider, the distributed technologies become more and more popular, so the backup restore system based on distributed has been gradually taken seriously. This article is to combine the various advantages of distributed system and then develop for the distributed backup and restore systems on file system.The Backup index is very important in backup system. From the macro point of view, the function of the entire backup system is determined by the backup index; specifically, the backup index recorded backup data of all "historical traces", it is a very important component of the backup system.In this paper, one of the key points is the framework design and code implementation. Backup index recorded all things that happened and its results in data backup process. As one of the ways of data backup, incremental backups have different methods of treatment to different backup systems, such as file backup based on modified time, file size; database backup based on the transaction number or log, etc. At an overall lack of a unified criterion, this also allows for different types of systems have their own criteria for judging, functions has been expanded with more inconvenience. This is also because the backup index has not been paid enough attention to. The weak ability to express and weak adaptability can also cause questions directly. Through the analysis of the previous problems of the backup index in backup system, this article summed up the basic design of the backup index idea, design principles, and design method. In order to be able to enhance the backup index ability to express information, in this paper, we propose the concept of backup tree. Every time the backup jobs are recorded in the backup index, from the overall look, these job collections are haphazard, however, from a single source data view, we can dig out some of the laws. The backup tree is put forward in order to discover one of the laws, the backup tree organizes mission-related information to the tree form. In fact, the backup tree is an abstract concept, and it does not exist in the backup index. In order to be able to reflect the tree in the backup index, some special treatments on the design of the data structure are required.Based on the backup tree, this paper presents a simple tree-based backup recovery algorithm. Because the various backup methods have their own characteristics, the backup tree structure will generate different tree structures based on different ways. Sub-tree nodes in the backup tree are based on their parents'node, that is, child nodes are at the basis of the parent node to do a backup. The logical structure of the backup tree can provide us facility when we need to do a restore job. It can provide a theoretical foundation for reducing redundant data and recovering duplicate data, speed up the recovery rate, and improve the reliability of recovery.The system provides distributed backup and restore on file system. The distributed backup system is divided into management server, media server, and client service. Management server is the core of the whole system: it is responsible for the management of the system's core data, receiving the media server and the client's message, issuing users'commands, achieving load balancing strategies and so on. Media server is responsible for storing users'backup data, which is called target data. It provides maintenance and management functions for target data; it is responsible for data encryption and compression; and it hides the physical location itself. Clients provide the management server the open local resources browser function, verify users'rights, send backup data, receive data recovery and so on. In this system, media server can be placed on any location of the internet, as long as it can access the network, its location can be dispersed or can be focused. Clients only need to be able to be connected to management server, it does not need to know any information about media server. The backup index is embedded in the management server, the system can normally work in offline circumstances. This is because the backup index manages all of the "historical traces of the backup". Without our special treatments, the backup index can identify the users'data, do the local backup and recovery.The biggest issue of distributed system's is the system performance bottleneck problem. In order to avoid this problem, the load balancing strategy is introduced in to the system. In the backup system, target data streams are dynamic, and sent in the backup process in real time, so the system also needs a dynamic load balancing strategy to adjust backup strategy in real-time. When there is large amount of source data, storage media servers will be over burden. If one or several media servers are over burden, the system backup efficiency will be obviously decreased, which will become the system performance bottleneck. While a number of other storage server are under load, in this situation all media server workload will need to be adjust in real-time. The most important factors impacting storage media server performance are CPU utilization, memory usage, disk IO, and network bandwidth. In this paper, it uses different weighted value add them to the calculation and obtain a final result according to the four basic resources'different levels. Then media servers send these results to the management server in real-time. After receiving the data, the management server carries out a statistics. If calculated values are too high, it indicates that the over burden of a media server may become system bottlenecks. And then the management server will notify the clients in real time that they are required to send data to low-burden media server to keep load balance.In this paper, the backup index is designed to achieve the following functions: (1) It can reduce the redundant data during doing the restore job; (2) It can increase the stability of the entire system; (3) It can implement the load balancing strategy and send data backup and restore to the entire distributed environment in balance. To meet the backup needs of the user completely.
Keywords/Search Tags:Distributed backup, Backup and restore index, Backup tree, Data recovery
PDF Full Text Request
Related items