Design And Implementation Of Distributed Storage System Based On Binary Array Code

Posted on:2020-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:J R Ze

Full Text:PDF

GTID:2428330590978622

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In the era of big data,the storage of massive data is facing huge challenges.Distributed storage system with its huge storage capacity,flexible expansion ability,low cost,high reliability and other characteristics,has become the main choice of massive data storage.With the increase of the amount of stored data,there are more and more storage nodes and more frequent node failures,which undoubtedly increases the instability of the system.Although distributed systems adopt replica fault-tolerant technology,it is costly to store massive data.The introduction of erasure codes can improve the space utilization,but at the same time,it also brings new problems.In the repair process,disk I/O and network traffic that are k times of failed data.This kind of repair amplification problem seriously affects the performance of degraded read and repair of distributed storage system,and also takes up a lot of network bandwidth.Therefore,how to solve the repair amplification problem has become an important topic in the research of erasure codes.As repair bandwidth reduction can effectively reduce the influence of repair amplification problems,this paper introduces a kind of binary Maximum Distance Separable(MDS)array code with asymptotically optimal repair bandwidth,and designs a Coded Distributed File System(Coded-DFS).Combined with the Hadoop Distributed File System(HDFS),Coded-DFS has triple-fault-tolerant capability and optimal repair bandwidth.In this paper,the Coded-DFS is deployed in the cluster,and the detailed functional test and performance evaluation experiments are carried out.Experimental results show that the Coded-DFS storage system can not only ensure data reliability,but also greatly reduce the repair bandwidth,and significantly improve the degraded read efficiency and repair efficiency.The main work and research contents of this paper are as follows:1.In this paper,the research status of binary array code is investigated,and the problem of single node repair amplification is discussed and analyzed.The application status of the erasure code in distributed storage system and the defects and requirements of the distributed storage system of the erasure code are investigated.A binary MDS array code--NBMA(New Binary MDS Array)code with progressive optimal repair bandwidth is introduced,which is studied from the point of view of theory and application feasibility,and the concrete implementation algorithm is given.2.This paper analyzes the working mechanism of HDFS in distributed storage system,and studies the design scheme and implementation technology of the combination of erasure code and HDFS platform.Combined with array code,the functions of encoding,downloading,block reading and file state detection are redesigned through Hadoop API.3.Based on the idea of relay repair,a coded storage system(Coded-DFS)including NBMA code is realized,and the functions of encoding,repairing and decoding the code are optimized from the point of view of engineering realization.It provides a fast solution for the system realization and function test of the erasure code.4.The actual distributed erasure code storage cluster is built,and the reliability of NBMA code is analyzed by simulating node failure.In the view of encoding efficiency,decoding efficiency,repair bandwidth,computational complexity and disk I/O for comparative analysis.Experiments show that the repair bandwidth of NBMA code under suitable parameters is about 45% and 25% less than that of CRS code and X-code respectively.

Keywords/Search Tags:

Distributed storage system, Erasure code, Binary array code, Repair bandwidth

PDF Full Text Request

Related items

1	The Research About The Encoding And Repair Mechanism Based On Erasure Code In The Distributed Storage System
2	Update Bandwidth For Distributed Storage
3	Research On Multi Stripe Repair Of Erasure Code In Distributed Storage System
4	Array Codes With Local Repair Property In Distributed Storage System
5	Research On Multi-Strip Repair Of Load Balanced Erasure Code In Heterogeneous Distributed Storage System
6	Research Of Piggybacking Design For Systematic MDS Code In Distributed Storage Systems
7	Research On Data Repair Techniques In Erasure-Coded Storage Systems
8	Research On Backup And Repair Technologies Based On Erasure Codes In Distributed Storage Systems
9	The Research On Multi-node Repair Problem Of Distributed Storage System
10	Research On Data Writing Performance Optimization In Erasure Code Fault-Tolerant Storage System