Font Size: a A A

Research On Key Technology Of Data Maintenance Based On Erasure Code Storage

Posted on:2014-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhengFull Text:PDF
GTID:2308330479479342Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the fast development of cloud storage and big data technologies, the survivability of data is becoming more and more important, and the main way to keep data survivability is the fault-tolerant technology. Nowadays, the fault-tolerant technology is mainly implemented through redundancy technology. The current mainstream redundancy technology includes replication and erasure codes. The storage efficiency of erasure codes is far higher than replication, but there is a big problem of data maintenance overhead.In this paper, we have done some research on the data maintenance overhead problem based on the erasure code, and propose two major problems for the maintenance of erasure code storage : 1. Maintain communication overhead between nodes is too large; 2. Data Recovery comp utational overhead is too large. We propose three solutions to solve the two problems.First, we propose a passive-repair algorithm from the perspective of data read. The main idea of the method is to use the system normal read bandwidth for data detection. If the data needs to be repaired, the decoded data will be cached in the local disk for future repair. The detected data by this method don not need been detected, downloaded and decoded when being repaired, reducing the overhead of communic ation and repair. To our knowledge, there is no similar approach to ours.Secondly, we propose an adaptive-detection algorithm from the perspective of system reliability, reducing data recovery and communication overhead. The main idea of the method is to adjust the data maintenance frequency based on the reliability of the system, system with a high reliability has a low maintenance frequency. As compared with other existing methods, the main difference is that t he data maintenance frequency dynamic changes based on the reliability of the system and lower the frequency of data maintenance, while ensuring data availability.Again, we propose a tolerance selection algorithm from the perspective of data classification. The main idea of the method is based on the difference of the data access pattern to realize different maintain frequency(in an actual storage system, the data access mode will be different [64]). Different data will achieve a different frequency of maintenance. For the data which is not frequently accessed, it will reduce the cost of communication and repair when lower its maintain frequency. To our knowledge, there is no storage system based on erasure codes taking into account the data classification when maintain data, and there is no research in this area yet.Finally, we achieve a prototype system based on an open source distributed file system(using erasure coding redundancy), and proposes a hierarchical development method. This method greatly reduced the time to develop the prototype system and through simulation and actual test methods we verify our prototype and the algorithm, the results show that the proposed scheme is effective.
Keywords/Search Tags:distributed storage, erasure code, data survi vability, data maintenance
PDF Full Text Request
Related items