With the acceleration of the digitalization process,the storage demand for massive data is constantly increasing,which is causing significant challenges and pressures for data storage and management.In distributed storage systems,two data redundancy strategies,multiple replicas and erasure coding,are used to ensure high system reliability.Compared to multiple replicas,erasure coding is widely applied in commercial storage systems and data centers due to its advantages of low redundancy and high fault tolerance.However,due to the additional coding calculation overhead and cross-node data transmission tasks introduced during the data writing process,the efficiency of erasure coding data writing is difficult to improve,which results in negative impacts on the system’s performance and reliability.Existing solutions for the erasure coding data writing process are divided into hardware acceleration and software algorithm acceleration.The method of stacking hardware faces the problems of cost increase and difficulty in portability.Therefore,optimizing the erasure coding data writing process from a software algorithm perspective is of significant importance.In response to the issues identified in the analysis of erasure coding data writing process,this paper conducts analysis and research from two aspects: erasure coding encoding calculation process and network data transmission process.Specifically,the main work and innovative content of this paper are as follows:(1)This paper proposes an optimization scheme for the encoding matrix and encoding scheduling of RS(Reed-Solomon)Cauchy codes to address the problem of high computational complexity and low data writing rate during data fault tolerance using erasure codes in distributed storage systems.Firstly,an optimization algorithm based on the greedy approach is proposed to select the generation matrix of the RS Cauchy code.By calculating the sparsity of the corresponding XOR matrix in the finite field,an initial row solution is greedily constructed,and an optimized sparse Cauchy matrix is obtained through traversal,thereby reducing the computational complexity of the encoding process.Secondly,a GA-CSHR algorithm is proposed to optimize the binary matrix encoding process after the Cauchy matrix transformation.This method solves the defects in CSHR(Code Specific Hybrid Reconstruction)and Uber-CSHR(Uber-Code Specific Hybrid Reconstruction)algorithms using genetic algorithms.By caching intermediate values in the encoding process and heuristically searching for the target block,the number of XOR calculations in the encoding process is reduced.Experimental results show that the RS Cauchy code encoding optimization scheme based on the generation matrix transformation and genetic algorithm reduces computational complexity by 40% compared to the original RS Cauchy code.(2)To address the problem of low data write efficiency and high transmission latency and traffic consumption during the data writing process in complex heterogeneous network environments,a tree-based encoding topology optimization scheme based on link reuse and traffic aggregation is proposed.The scheme aims to reduce the transmission latency and traffic consumption during the data writing process in a heterogeneous network environment,and to improve the data writing efficiency and system load balancing degree.Firstly,the data transmission process during the RS coding data writing process is analyzed in depth,and the topology construction during the RS coding data writing process is transformed into the problem of finding the minimum Steiner tree forest.A calculation model for transmission latency and traffic consumption is established,and the data writing topology satisfying both transmission latency and traffic consumption is obtained by solving the model.Finally,the genetic algorithm is used to optimize the objective function.Simulation experiments show that,in the fat tree network architecture,the proposed scheme provides lower transmission latency and traffic consumption,and improves the system load balancing degree compared to the star coding topology,pipeline coding topology,and traditional tree coding topology schemes. |