Font Size: a A A

Research On File System Level Continous Data Protection Technology In Distributed Storage System

Posted on:2010-11-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YaoFull Text:PDF
GTID:1118360302471164Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In current information age,as the explosive growth of data and the data value rising, storage system should take efficient measures to guarantee the reliability,availability and performance.At the same time,the computer virus,hacker attack and user's misoperation can damage the data validity.Some applications have the strict requirement for the data history and the backup window.Reducing the interval between backup is the only way to meet these requiremens.The traditional data backup technique cannot response to these challenges.Continous Data Protection(CDP) technology is designed to be employed in this situation.The backup data is created on the every data change,so the CDP can provide the minimal recovery point objective(RPO).On the other hand,the distributed storage system becomes to the hot spot to meet the mass data demand.It can provide the high performance,also have the good scalability to have more capacity.The CDP technique face to distributed storage system will bring the better RPO to the storage system,meanwhile the distributed architecture can help optimize the time overhead and the backup data amount.The distributed file system which integrates the contiuous data backup metadata management will bring the CDP function to the distributed storage system.Firstly,a new distributed file system is proposed based on the research of the current data protection schemes.A detailed design for this distributed filesystem is presented in this thesis,which has a integrated management of the file system metadata and the backup data metadata.It expands the traditional metadata structure form one dimensional to two dimensional by adding time information field to metadata.This metadata management has a targeted strategy for CDP applications which can reduce the backup operations' overhead.The file system is possible to provide the online history view of the specific file or the whole file system at the specific history time.Secondly,to further reduce the time overhead of CDP,some designs are presented to take the advantage of the distributed architecture.Concurrently data operations will be perfomed when the clients need to communicate with the data storage servers.The large data I/O request is optimized to take the advantage of the concurrent operations.The pre-fetch read and the combined write request scheme are proposed in order to optimize the traditional small read and write requests' performance.The data transfer,data backup and data recovery operations can all benefit from these strategies to acquire the better performance.Thirdly,a particular case of data chage method is studied,which will cause the large block data movement in the file,such as insert data or delete data at the specific offset.It will lead to produce more backup data according to the current CDP scheme.Two new semantics of file operation are proposed in order to solve these problems.The new developed applications can directly take the advantage to reduce the backup data amount, the traditional applications will be transparently optimized by the client agent program.Finally,two method based on the data content comparison are presented to further reduce the backup amount.The system can make the decision of whether the data backup operation is performed or not by comparing the MD5 code of the new data and the older data.If the two MD5 codes are the same,the system will not create the backup of the old data.If the two MD5 codes are different,an additional XOR operation will be performed to find the actually different data part to eliminate the redundancy.
Keywords/Search Tags:Distributed storage system, File system, Continuous data protection, History view, File system semantic, Differences data backup
PDF Full Text Request
Related items