Font Size: a A A

Research On Storage Node Selection And Node Repair Technology In Distributed Storage System

Posted on:2020-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y M HuanFull Text:PDF
GTID:2428330599459707Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,the traditional local storage model can no longer fulfill the user's storage requirements for huge amounts of data.The concept of distributed storage provides a new structure for solving the current problems of massive data storage.Distributed storage systems generally use departure strategies to distribute storage data across multiple storage nodes.At the same time,data fault tolerance mechanism is introduced to ensure data security and integrity,and node repair is performed with rules of fault tolerance mechanism.Therefore,node selection and failed node repair technology have also become two key entry points for the study of distributed storage systems.The work of the storage node selection in this paper mainly focuses on the Ceph distributed storage system.The original storage node selection algorithm in Ceph only considers the current remaining capacity of the storage node as the selection criterion when performing the multi-copy node selection process.When the fluctuations occur and the internal load of the storage node becomes larger due to other services,the read and write performance of Ceph will decreases.Therefore,it is necessary to use the link quality between storage nodes and the storage node load as indicators to evaluate the advantages and disadvantages of the storage node selection scheme.The work of failed node repairment is mainly based on a distributed storage system using erasure code class fault tolerance technology.Compared with the multi-copy fault-tolerant mode,the traditional error-tolerance technology of erasure code can effectively improve the storage space utilization in the distributed storage system,but it will bring greater network costs in the data recovery process.Therefore,currently,an improved erasure code method such as a regenerating code is generally used to reduce the amount of data transmission.The repetitive node repair process based on the regenerating code needs to consider the storage node network topology and link quality and constructs the bandwidth optimal repair tree with the new node as the root and the remaining nodes as the participating repair nodes.The existing literature and work mostly use approximation.The local optimal repair tree construction method solves this problem and does not reach the global optimum in the node repair time.In this paper,the work and innovations of the above node selection problem and single node fault repair problem are as follows:(1)In terms of node selection: optimizing the Ceph node selection algorithm requires measurement of network parameters,but obtaining network state parameters in a traditional network architecture requires cumbersome hardware configuration and a large amount of measurement overhead.This paper designs a Ceph distributed storage system based on Software-Defined Networking(SDN)technology,and improves the traditional storage node selection strategy of Ceph.Firstly,it uses SDN technology to obtain network and node load status to simplify network configuration and reduce.The measurement overhead is then established and solved by a fuzzy multi-attribute decision-making mathematical model that comprehensively considers various factors such as network and host.By solving the problem,the correlation degree of the node is obtained,and the weight of the storage node is adjusted to achieve the purpose of storage node selection.(2)In terms of node repair: This paper proposes a genetic algorithm based distributed storage fault node repair tree construction scheme,which considers the network topology and link bandwidth performance of the storage system,so as to construct the solution In the genetic algorithm,the background-related coding,crossover and mutation operators are designed in order to achieve the optimal repair tree with the largest bottleneck path bandwidth,and the approximate global optimal solution of this problem is obtained.
Keywords/Search Tags:Distributed storage system, Software-Defined Networking, Fuzzy multi-attribute decision making, Regenerating code, Genetic algorithm
PDF Full Text Request
Related items