Font Size: a A A

Research On Distributed Fault-tolerant Storage Optimization Technology Based On Regenerating Code

Posted on:2017-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhuFull Text:PDF
GTID:2308330485466381Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, large-scale data storage has become one of the key technology of big data. Distributed storage system is mostly deployed on inexpensive business machines, and node failure has become the norm rather than the exception. So reliable mass data storage is becoming a research hotspot. Typical triplication policy has the problem of high storage cost and poor fault tolerance, which became the bottleneck of system expansion. In recent years, the industry generally use erasure codes as fault tolerance strategy of the storage system, but there is the recovery problem of excessive bandwidth consumption. Scholars turned to the study of regenerating code based on network coding. Regenerating code can achieve optimum bandwidth overhead, but enormous cost of calculation and other issues hindered its widely application. Additionally most storage systems only use one encoding method as single fault-tolerant strategy that ignores the differences of stored files.The aim of this paper is to build a low redundancy, high availability, highly reliable distributed storage system, which use Cumulus storage system as the platform, the main work includes the following aspects:1. For the deficiencies of the prior coding method in storage efficiency, doing tradeoff among access latency, bandwidth, repair, computational complexity, and introducing simple regenerating code (SRC). At the same time, optimize SRC at aspect of degraded read. This paper implements SRC and opSRC in Cumulus system and has a comparison among these coding schemes. Experimental results show that SRC increase little storage overhead but greatly reduce the cost of recovery.2. Files in the storage system have life cycles and frequency of visits. Adaptive coding model combines file state and system state. The model contains the transformation between two forms of the encoding method. In this paper, XOR and simple regenerating code are described in detail about the process of construction and operation of the adaptive code model. Experimental results show that the addition of adaptive coding of distributed storage system contributes higher storage efficiency and lower repair costs.
Keywords/Search Tags:Distributed storage, HDFS, recovery cost, simple regenerating code, adaptive coding
PDF Full Text Request
Related items