Font Size: a A A

Research On Data Protection System For Hybrid Storage

Posted on:2019-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X YuFull Text:PDF
GTID:1488306470993509Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the storage system for big data applications,not only the reading and writing performances of data need to be focused,but the system's key data storage security also needs to be guaranteed with the snapshot,continuous data protection and other data protection technologies.However,this is a challenge to the storage system in its operating performance,capacity management and reliability.The Flash-Based Solid State Disk(SSD)breaks through the limitations of traditional Hard Disk Drive(HDD),featuring fast performance,low energy consumption,and shock-and-drop proof,thus relieving the bottleneck effect of the computer system performance.Therefore,it has been widely used by various major storage vendors.However,due to its asymmetry between reading and writing performances,wearing in erasing,and relatively high unit price,SSDs cannot completely replace HDDs yet,so a hybrid storage of HDDs and SSDs remains the mainstream form for storage in the near future.The design and optimization of hybrid storage system is a hot topic in the field of storage.As for the workload data access performance requirements,sequential/random characteristics,the access boom and recession,a large number of research results have been obtained in the address mapping strategy(mapping granularity,mapping rules),classifications of boom and recession data,transfer strategies,and the optimized grouping of the amount of storage media.Based on the study on the key technologies of data protection for the hybrid storage system,this thesis builds a hybrid storage system with SSDs and HDDs,which takes advantage of the two storage media to optimize the data reading and writing performances,the storage efficiency,reliability and the energy consumption of the data protection system.The innovative achievements have been made as follows.1)According to the workload features of the data protection system,a new data protection system design method based on SSD/HDD hybrid storage is proposed,combining the strengths of both SDD and HDD.The system uses SSD as the source volume,and the HDD forms the disk array as the Continuous Data Protection(CDP)log volume and snapshot volume.With the reading and writing separation control strategy,the system makes the SSD response to the reading requests and writing requests,and the HDD disk array only responsible for responding to writing requests.Thus the CDP log is recorded in the ROW manner to avoid the data reading and writing congestion on the source data volume.While generating the snapshot volume,the system merges the ROW CDP log into a hierarchical snapshot of incremental snapshot and differential snapshot based on users' demands,thus improving the storage efficiency and recovery efficiency of the data protection system.Experiments show that the hybrid storage system makes full use of the high-speed reading and writing characteristics of SSD,so the reading and writing performances are greatly improved.At the same time,the disk array can provide timely data protection and recovery by virtue of its good sequential reading and writing performances.So the hybrid storage system is superior in the reading and writing performances,storage capacity and cost-effectiveness.2)The method of storing CDP historical data,represented by TRAP-Array parity logs,effectively reduces the overheads of storage capacity,but it increases the potential risk of data recovery.In this thesis,an efficient TRAP-Array continuous data protection system design method based on S-RAID is proposed.It uses S-RAID disk array to store the TRAP parity log of continuous data protection system,which not only reduces the demand of storage space,but also improves the reliability of the system.For most workloads of database or Online Transaction Processing(OLTP),the data update rate are not high,so S-RAID with the partial disk parallelism is capable of meeting the bandwidth requirements of CDP logging.This thesis studies the system architecture of the TRAP-Array continuous data protection system based on S-RAID,designing the functions and working modes of each function module,and putting forward the corresponding data recovery strategy.The experiment shows that in the TRAP-Array continuous data protection system based on S-RAID,the CDP log can greatly reduce the storage space overheads and the energy consumption,while the S-RAID provides a redundancy for the TRAP parity data through the parity codes.And when the disk data errors occur,the data can be recovered by the data redundancy information to prevent the TRAP parity recovery chain from being destroyed and improve the reliability of the TRAP parity log.In addition,with the disk scheduling algorithm,S-RAID shifts the disk group without any data requests into a standby state,thus reducing the energy consumption.3)As for the inefficiency of reducing energy consumption caused by random writing requests in S-RAID,this thesis proposes the EPS-RAID structure with an extra parity to optimize S-RAID random reading and writing operations.With a parity disk and an SSD added to the S-RAID,the newer parity is generated based on the RS(Reed-Solomon)coding scheme to record the random writing request of the standby disk groups in a redirected manner,thus avoiding frequently initiating the standby disk groups due to random reading and writing requests,gaining more disk idle time,and effectively saving energy for S-RAID.According to the experiments,EPS-RAID is suitable for storing backup and continuous data protection system whose main workload is the sequential data access,and capable of setting the number of disks for each group to meet the different performance and energy requirements according to the workload situation.In the EXT4,NTFS,and NILFS file systems,EPS-RAID structure can effectively improve the S-RAID writing performance and reduce the energy consumption of the storage system.It is effective for the applications featuring the disk grouping in a large number and medium or large-scale grouping.
Keywords/Search Tags:Data protection system, Hybrid storage, Disk array, Data recovery, Energy saving, SSD
PDF Full Text Request
Related items