| Erasure coding is widely used in cloud storage systems to achieve strong fault tolerance because of its extremely low storage cost.To provide an efficient,low-cost,and high-availability storage service,focusing on the two core problems of data repair and data update in erasure coding,comprehensive research on network transmission optimization is conducted.The innovations of this dissertation include the following:First,this dissertation presented Loop Repair,a unified data repair scheme for fullnode repair.This dissertation proposed Bi-Loop,a scheduling framework with two loops(i.e.,in-rack loop and cross-rack loop)for homogeneous environments(i.e.,with identical link bandwidth)and heterogeneous environments(i.e.,with different link bandwidth).The in-rack loop solves the problem of data repair for the homogeneous environments,and the combination of two loops is to solve the problem of data repair for the heterogeneous environments.Based on Bi-Loop,this dissertation presented Loop Repair.Loop Repair builds on three design primitives:(i)focusing on the full-node repair;(ii)fully utilizing all the upstream and downstream bandwidth of all healthy nodes;(iii)repairing in a circular and load-balancing manner.Theoretical analysis demonstrates that the Loop Repair performs best in homogeneous environments.Meanwhile,we give the range of time overhead in the homogeneous environments and heterogeneous environments.Besides,the local cluster experiments show that Loop Repair can effectively improve the throughput of data repair by at least 30% compared to existing data repair schemes.Meanwhile,it is generic and can be applied to different erasure codes and workflows.Second,this dissertation proposed CAU-DB(CAU-Delta Batch),an update scheme with a trim merging strategy.CAU-DB is based on the CAU algorithm,and two improvements are proposed for the CAU algorithm: 1)CAU transfers the whole data block for data updates,while this dissertation proposed a trim merging strategy to reduce the update traffic; 2)This dissertation uses xor-based update based on bitmatrix to improve the update efficiency.Comprehensive local cluster experiments show that CAU-DB can significantly improve the update throughput by at least 19.4% compared to existing data update schemes while reducing the cross-rack traffic by at least 16.7% on average.Third,this dissertation proposed T-Update B(T-Update Batch),an update scheme with a gradual merging strategy.T-Update B takes batch updates and XOR into T-Update,which makes a joint with four key network optimization technologies(XOR,delta transmission,data forwarding,and batch updates).Meanwhile,T-Update B offers a tracedriven gradual merging strategy to correct the problem of an excessive merging of batch updates.Comprehensive local cluster experiments show that T-Update B can improve the update throughput by 50%-600% while reducing the cross-rack traffic by 29.2%-81.8%.At the same time,it is genericity to adapt to different erasure codes and workflows.Theoretical analysis and experimental results show that the three data fault-tolerant technologies proposed in this paper for cloud storage have efficiency,low cost,and genericity,which can be applied to various storage systems such as P2 P storage systems,distributed storage systems,and cloud storage systems. |