Font Size: a A A

Research On Data Management Technology Of Distributed Storage System

Posted on:2022-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y X N HuFull Text:PDF
GTID:2518306605990079Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Hadoop distributed file system(HDFS)is an open source implementation of Google file system,which has been used by Amazon,Yahoo,Facebook and other companies for largescale data storage.However,the current HDFS system uses multiple metadata nodes to manage metadata.These metadata nodes do not communicate with each other.They are independent of each other and each carries part of the metadata.The extremely high access to a certain metadata will increase the load of its metadata node and reduce the performance of the storage system.Therefore,this architecture does not solve the problem of metadata load balancing.Moreover,in order to improve the reliability and availability of data,HDFS uses the "rack aware" method to directly place copies of file data in data nodes.However,this method does not fully consider the load characteristics of the data node where the replica is located,which may lead to overload of some data nodes and vacancy of some data nodes,resulting in unbalanced data distribution of the cluster and greatly reducing the operation efficiency of the cluster.In recent years,the uneven distribution of metadata and file data in distributed file system has been widely concerned by some scholars.In this thesis,the load balancing of metadata nodes and the data distribution of data nodes in HDFS distributed storage system are investigated in detail.The main research results of the thesis:1.Summarize the basic concept and nature of reinforcement learning and its application in practical problems.This paper describes several important metadata load balancing methods of distributed storage system.For distributed storage system,the main methods of data distribution are summarized.2.Aiming at the problem that the existing load balancing algorithms of HDFS architecture do not consider the heterogeneity of metadata servers,this paper proposes a metadata dynamic load balancing algorithm based on reinforcement learning by combining the classic congestion control method in FAST TCP and reinforcement learning model.Experimental results show that DBDM can improve the performance of metadata server compared with ADMLB.3.The data distribution algorithm of HDFS architecture does not consider the size and access delay of data itself,this chapter proposes a data distribution algorithm based on data block preference.Experimental results show that,compared with HDFS data distribution method,the proposed data distribution algorithm can better allocate data according to the preference of data blocks,improve the data storage performance.
Keywords/Search Tags:Distributed storage, load balancing, metadata, data distribution, policy-gradient
PDF Full Text Request
Related items