Research On The Capacity And Bandwidth Tradeoff Of Heterogeneous Distributed Storage Systems Based On Separate Nodes

Posted on:2020-04-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Z Wang

Full Text:PDF

GTID:1368330623963937

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,huge amounts of data are generated,transmitted,processed,stored and increasing exponentially with the rapid development of Internet application technology.In order to satisfy the storage requirements of massive data,distributed storage systems(DSSs)are widely used and investigated due to their low cost,high scalability,high access speed,high reliability,support for higher concurrent access,etc.In DSSs,data are stored in multiple nodes(servers)connected through networks,where node failures become quite prevalent as the system scale and number of nodes increasing.Erasure codes are widely used to ensure data reliability in commercial DSSs such as Microsoft Azure,Google GFS and Taobao TFS.Traditional erasure codes(such as Reed-Solomon codes)can greatly reduce data redundancy while ensuring high reliability,but require a large amount of network bandwidth when repairing failed nodes.To balance the tradeoff between storage and repair bandwidth,Dimakis et al.used the information flow graph to model distributed storage systems and defined the capacity of the DSS model.They formulated optimal tradeoff bound of node storage and repair bandwidth,based on which the construction problems of minimum storage regenerating(MSR)code and minimum bandwidth regenerating(MBR)code were proposed.Rashmi et al.constructed optimal exact MSR and MBR codes with product matrix.There are lots of works on the tradeoff bounds and regenerating code constructions for the homogeneous DSS model where all the nodes have the same parameters(storage capacity,repair bandwidth,number of helper nodes etc.).In realistic heterogenous DSSs,storage nodes are generally organized with clusters(racks),where the intra-cluster networks are often faster and cheaper,while the cross-cluster bandwidths are limited.Therefore,it is natural to download more data from intra-cluster nodes when repairing a failed one.When considering the cluster structures,there mainly are two types of models,namely,with or without a relay node in each cluster,where the relay node collects data from intra-cluster nodes and transmits to cross-cluster nodes.Hu et al.proposed coding strategies to minimize the cross-cluster bandwidth when repairing a failed node in the model with relay nodes.Prakash et al.also investigated cluster DSS model with relay nodes which were not only used for node repair,but also used for data collection.On the other hand,Sohn et al.focused on the cluster DSS model without relay nodes,where the intra-cluster and cross-cluster bandwidths were analysed as two separate parameters.In Sohn’s model,they assume that all the alive nodes are used to repair a failed one.Although this assumption maximizes the system capacity,the reconstruction read cost(the number of helper nodes)is also maximum,which is not flexible to analyse multiple node failures.In addition,separate nodes are also prevalent in realistic DSSs,but relative research works are insufficient,which motivates our work in this thesis.First,we investigate the cluster model without relay nodes under more generalized assumptions where the newcomer does not have to download data from all the alive nodes.We only assume the intra-cluster nodes are all used,while the number of cross-cluster helper nodes are analysed as a variable parameter.It is obvious that the traditional homogeneous and cluster DSS model can be retrieved with our generalized model.Moreover,we further investigate the DSS model with clusters and separate nodes by analysing min-cuts,which is inductive theoretically to construct regenerating codes utilizing the realistic network bandwidths efficiently.The main contributions of this thesis are as follows.First,we characterize the capacity and tradeoff bound between storage and repair bandwidth of the cluster DSS model with generalized system parameters.When repairing a failed node,the newcomer does not have to download data from all the alive nodes.We only assume the intra-cluster alive nodes are all used,because the intra-cluster networks are generally faster and cheaper.A regenerating code instance is constructed based on the tradeoff bound of the cluster DSS model.Second,we formulate the capacity and tradeoff bounds after adding a separate node to the cluster DSS model.We theoretically characterize the relationship between system parameters and capacity after adding a separate node.The influence of cluster and separate node repair bandwidth constraints to the tradeoff bound is also analysed.Specifically,we prove that when each cluster contains R nodes and any k nodes suffice to recover the original file(MDS property),adding an extra separate node will keep the capacity if R|k,and reduce the capacity otherwise.Similarly,we give the MSR code construction with interference alignment for the case with one separate node.Finally,we investigate the DSS model with clusters and multiple separate nodes,which is also named the heterogenous cluster DSS(HC-DSS)model,because the separate node can be seen as a special cluster with only one node.We formulate the capacity and tradeoff bound for the case with multiple separate nodes.Meanwhile,the storage and repair bandwidth tradeoff bounds are analysed in multiple aspects where the number of helper nodes,the number of separate nodes and repair bandwidth constraints vary respectively.Additionally,we also construct a kind of MSR code for the HC-DSS model.

Keywords/Search Tags:

distributed storage system, cluster, separate node, information flow graph, regenerating code, network coding

PDF Full Text Request

Related items

1	The Research On Multi-node Repair Problem Of Distributed Storage System
2	Research On Repairing Algorithm Of Node In The Distributed Storage System
3	Research On Distributed Fault-tolerant Storage Optimization Technology Based On Regenerating Code
4	Research On Storage Node Selection And Node Repair Technology In Distributed Storage System
5	Research On The Related Problems Of Subspace Code And Distributed Storage System Based On Network Coding
6	Research On The Repair Of Failure Node In Distributed Storage Systems
7	Performance Analysis And Optimal Design Of Regenerating Codes In Distributed Storage Systems
8	Research And Application Of Regenerating Codes In Distributed Storage System
9	Research On Data Redundancy And Maintenance Technology In Distributed Storage System
10	Study On Fault-tolerant Mechanisms Of Distributed Storage Systems Based On Network Coding