Font Size: a A A

Research On Data Replication Technology Based On HDFS Storage System

Posted on:2019-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:T FanFull Text:PDF
GTID:2438330566983714Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cloud computing system is a distributed system,and distributed computing is the basic model of cloud computing.Usually use distributed storage system to support efficient distributed computing,and the common data replication technology,a copy of the three strategies,Erasure Code technology,for example,can be used to improve the reliability,availability and scalability of the distributed system.HDFS(Hadoop Distributed File System)is a Distributed File System developed by the Apache foundation.HDFS storage system with three copies of strategy takes up a lot of energy consumption of storage resources and can light all storage space,data increase in demand as a result of rising heat,a copy of the fixed strategy cannot be unable to maintain the high availability of data,and makes the node load imbalance.Erasure Code technology is used to solve the problems of the storage system resource consumption too much too fast,but need to consume large amounts of network bandwidth to interact with the data nodes,block of data needed for the download file,decode a block of data back to the original data in the process of operation,all need to consume resources,memory resources,CPU resources consumption cost is higher.In order to solve the problems in HDFS storage system,this paper has carried out related research.The main contributions of this paper are as follows:Firstly,the dynamic copy scheduling algorithm is designed according to the influence factors of copy scheduling,including the heat value and static influence factors of the file,as well as the available coefficient of the file.By analyzing copy scheduling impact factors,to find out the copy of the demand quantity,through comparing with the existing replications,dynamic adjustment,to adapt to the heat changes bring a copy of the demand for data.By dynamically adjust the number of copies improve the availability of nodes,according to the availability of documents,in the node overload,take the initiative to increase the node to place the copy,maintain a high availability of the document,and keep the load balance of the system.When the system is idle,the number of existing copies is greater than the number of required copies.By reducing the number of copies,the consumption of resources is reduced and the utilization rate of system resources is provided.Secondly,in order to solve the good display delete code to restore data blocks with high resource consumption price problem,this article through to affect data read time delay,system load balance were analyzed,and various performance indexes of find the representative indicators,system throughput,and carries on the judgment through the node load,and multiple user request scheduling algorithm is designed.The scheduling algorithm can effectively reduce delay in Erasure Code file storage system to obtain the average delay and achieve the purpose of load balancing optimization,it can improve the stability of data acquisition,give users a better experience.Finally,on the basis of HDFS distributed file system,a hybrid storage strategy of copy and Erasure Code is proposed.This strategy uses Erasure Code technology to improve data security and reduce storage cost.Dynamic copy strategy is adopted to adjust the utilization of node resources during distributed cluster operation,adjust the load balance of the system,and maintain high availability of data.Through experimental analysis,this paper proposes the hybrid storage strategy compared with HDFS three copies of the original strategy,have stronger ability of load balancing,and reduce the storage costs,enhance the security,the data in a state of high availability.
Keywords/Search Tags:distributed storage system, data copy, Erasure Code, mixed storage
PDF Full Text Request
Related items