Research On Reading And Writing Balanced Data Distribution In Distributed Storage System

Posted on:2020-12-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Wang

Full Text:PDF

GTID:2428330575469953

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In order to cope with the increasing data size and storage cluster size and the increasingly stringent performance requirements for data reading and writing,distributed storage systems have been widely used in the data storage industry.In distributed storage systems,redundancy is necessary to make data more available.Replica is a commonly used redundancy method.In a large-scale system,because the data volume is extremely large,if the replica of data is not placed with a reasonable and efficient algorithm,that may cause great loss of performance in the whole system.In distributed storage systems,the current widely used data distribution algorithms include the crush algorithm and the kinesis algorithm.The crush algorithm is designed to achieve the least amount of data movement when the cluster changes,while the kinesis algorithm is designed to make the system's resource is used more balanced with relatively flexible storage location selection.However,these existing data distribution algorithms do not pay attention to the problem of read and write balance.The lack of attention to read and write balance causes uneven load between nodes in the cluster during fault recovery and data migration,thereby increasing the time spent.Therefore,we are working on the above problems,aiming at finding a data distribution method that can satisfy the requirement of read and write balance in a distributed storage system,ensuring that multiple replicas of the same data are not placed on the same node,and The data replica can be automatically adjusted with the dynamic expansion of the cluster to maintain the read and write balance of the data.Aiming at the problems of the crush algorithm,we propose a new data distribution method.It pays attention to achieving read and write balance of data replica in distributed storage systems,and can cope with the dynamic changes and expansion of clusters.Because the data distribution has the characteristics of read and write balance,the data migration time when the cluster changes is reduced.The main work of this paper is summarized as follows:Firstly,through experiments,we found the data replica distribution algorithms commonly used in distributed storage systems,such as the crush algorithm,its data distribution has the problem of read and write unbalance.Starting from this problem,we shows that the unbalance read and write of data distribution will increase the reconstruction time of the system,which may lead to the secondary failure of the node and the possibility of permanent data loss.At the same time,the long-term reconstruction of the system will preempt the running resources of the external application.This demonstrates the importance of read and write balance to the whole system and proposes our research goals.Aiming at the data distribution problem existing in the crush algorithm,a data distribution method that satisfies the standard of read and write balance is proposed by means of mathematical analysis and theoretical verification.The Significant improvement in degree of read and write balance of data distribution is realized by distributing the data and its replica according to the specified way.In our proposed data layout method,data is marked in the order in which it arrives at the storage system,and the distribution of data is summarized into a special mathematical distribution in the matrix.This mathematical distribution is simple and easy to calculate.The metadata server distributes the data and its replicas after the data arrives at the storage system according to the mathematical distribution of read and write balance.When the system changes dynamically due to node increase and decrease,the multi-choice paradigm principle is used to select the optimal location for data reconstruction,and different weights or flexible deployments can be set for different nodes according to different system requirements.An independent data migration strategy for node addition is proposed to keep the data distribution still achieve read and write balance after the cluster changes dynamically.The advantages of our proposed method are verified by theoretical analysis and simulation experiments.Our experiments and analysis show that in the virtual cluster based on NS2 simulator,the method our proposed significantly improves the degree of read and write balance of data(about 50% on average)compared to the existing crush algorithm.Reduced failure reconstruction time by approximately 14% while maintaining relatively consistent cluster performance consumption.

Keywords/Search Tags:

Replica placement, reading and writing balance, distributed storage system

PDF Full Text Request

Related items

1	Research And Experiment About The Data Replica Placement Algorithm In Cloud Storage System
2	Research On Strategy Of Data Replica Placement For Geo-distributed Cloud Storage Services
3	Research Of Replica Management Mechanism In Cloud Storage System
4	The Research On Data Replica Placement In Geo-distributed Cloud Storage System
5	Research On Key Technologies Of Distributed Storage In Cloud Computing
6	Research On Replica Placement And Selection Strategies In Heterogeneous Cluster Storage System For Big Data
7	Research Of Replica Placement Strategy In Cloud Storage System
8	Research On HDFS Replica Placement Management Policy And Retrieval Algorithm In Heterogeneous Storage Environment
9	Research On Optimization Of Big Data Storage Replica Strategy In Cloud Environment
10	Research On Distributed File Placement Algorithm Without Depending On Popularity Information