Font Size: a A A

Data Backup Based On Duplicated Data Detection

Posted on:2011-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:X K YangFull Text:PDF
GTID:2178330332960294Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the most valuable asset that owned by a company is data. Any lose or damage of data will bring irreparable loss to the corporations. Undoubtedly, the best preventive measure is data backup. With the development of corporations, the data volume that needs to back up grows rapidly. How to efficiently back up the huge amount of data is an urgent problem. In addition, the backup will produce redundant data which will take a lot of disk spaces. The duplicated data detection is a popular technique which can detect amount of redundant data and reduce the storage. Therefore, the combining research of duplicated data detection and data backup is of great practical value.Firstly, this paper introduces data backup and duplicated data detection. Traditional algorithms of duplicated data detection cannot identify file rename or file shift,and this paper proposes a BirthObject id-based File Match Scheme to addresses the problem for NTFS environment. The algorithm of duplicated data detection which is based on data blocks'hash-detection could not adapt its size according to the condition of network. This paper proposes a method which can adapt the size of data blocks, and the data block is divided into six grades. The data block will be bigger when the condition of network is well; and the data block will be smaller when the condition of network is bad with a growing computational overhead. The method can achieve a balance between the network transmission and the cost of detection.With the improved techniques of duplicated data detection and data backup, this paper presents a scheme of data backup based on duplicated data detection. This system consists of duplicated data detection, judgment of file state, data backup, data encryption and configuration management. The method can overcome the common shortcomings of the three schemes of data backup, and it can reduce the data redundancy of data backup and improve the efficiency of data backup.
Keywords/Search Tags:Data backup, File match, Duplicated data detection, Rsync
PDF Full Text Request
Related items