Font Size: a A A

Research On Data Equalization Method Of Large Scale Erasure Code Storage System

Posted on:2022-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HeFull Text:PDF
GTID:2518306350489494Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,the scale of storage system becomes more and more complex and huge,and the redundant scheme of multiple Replications consume too much space.In order to save the storage space,the erasure code redundancy scheme is used in the storage system.However,in large-scale erasure code storage system,the fast coding,updating and recovery of faults often lead to the uneven distribution of data in the cluster and affect the performance of the cluster.Some of the existing schemes balance the data in the process of coding recovery,but these operations increase the coding delay and reduce the system performance;Another part of the existing data balancing schemes only balance for a single node,without considering the overall layout of the cluster.In order to solve the problem of data balance in largescale erasure code storage system,this thesis proposes a data balance scheduling strategy for cloud storage by distinguishing the heat of data blocks and check blocks in nodes,giving priority to in rack scheduling and then considering cross rack scheduling,that is,two-stage scheduling execution.For the whole cluster,according to the multi parameter weighting,the task execution order is considered comprehensively,and the hot data layout is optimized to achieve the data balance effect of the whole cluster.At the same time,it also meets the constraints of large-scale erasure code system.The main contents of this thesis include the following three aspects:First of all,because the data transmission within the rack is faster than that across the rack,the transmission within the rack should be given priority.The cluster is sorted according to the degree of data idle and congestion in the rack to find out the priority of data balancing and speed up the data balancing process of the whole cluster.At the same time,considering the network usage of foreground application,the available network bandwidth is limited to achieve fast and low-interference data balance.Secondly,a two-stage scheduling algorithm for cloud storage data balancing is proposed.For the data blocks in each node,a data balancing method based on the priority of hot and cold data is proposed.The algorithm designs a variety of priority queues to achieve the balance of node data volume.At the same time,the hot data set will not lead to the overheated reading of a single node in one node,so as to achieve a better load information layout.In the two-stage scheduling algorithm,the tasks in the rack are transmitted,and the one-stage in rack scheduling algorithm is used for calculation.For the tasks that cannot be scheduled in one stage,the two-stage cross rack scheduling algorithm is used.The algorithm can reduce the cross rack data transmission,reduce the overall time overhead,and efficiently complete the data balance of all nodes in the cluster.Finally,the performance of cloud storage data balancing scheduling strategy is evaluated from four aspects: front and back-end network bandwidth usage,data balancing time,data block distribution and load information distribution.The simulation results show that the cloud storage data balancing scheduling strategy is suitable for the environment of large-scale erasure code storage system,achieves the effect of low interference for foreground application,and can realize fast data balancing.Compared with the previous methods,the data balancing time is effectively reduced by 18.1%,and the load information distribution is optimized,It avoids the centralization of hot data on the same node.
Keywords/Search Tags:Erasure code, storage system, data balance, scheduling
PDF Full Text Request
Related items