Font Size: a A A

Research On Data Replication Strategy In Cloud Storage System

Posted on:2018-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:B HanFull Text:PDF
GTID:2348330518498934Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,the newborn news,we-media,social software and live platform have been fully applied.Because of the rapid increase of files and multimedia data,the efficient and secure storage of data should be solved immediately.Cloud storage system solves the problem of the mass data storage with its characteristics of distributed framework and large cluster.Cloud storage system has the advantages of high reliability,strong scalability and batter performance-price ratio.In the research of cloud storage technology,replica technology is more important,because it can improve the availability and reliability of the system.Three aspects of research on Hadoop are proposed by this paper,such as the optimal placement of replicas and the dynamic decision of replica number and the rapid recovery of replicas.The purpose is to solve the problems of Hadoop in the default HDFS strategy.The problems are inadequate resource utilization and unbalanced load and excessive cost of data migration.The main research contents are as follows:The first strategy is replica placement strategy based on node evaluation.The strategy takes the node evaluation value as the goal,select the load rate of node storage load,the correct response rate of nodes,the ratio of transmission bandwidth between nodes and the percentage of used nodes in the rack,as the four evaluating indicator.Multi objective optimization method is used to assign weights to the four evaluation indicator,then the normalization of the evaluating indicator is used to obtain the value of node evaluation.Then in the process of placing node replica,the highest evaluated node is selected to store the master replica in the local rack,the highest evaluated node in the remote rack that has a highest entirety evaluation is chosed to store the other replicas.The results of simulation show that this strategy can make cloud storage cluster keeping a better load balancing degree and an efficient node selection process and increase the speed of read and write.The second strategy is dynamic replica number management strategy based on file heat.It is found that the user's access to the file usually obeys Zipf law.Based on this rule,a dynamic replica management strategy is proposed.Firstly,the strategy gets the frequency statistics of file access in the statistical cycle,then calculate the smooth value with the frequency statistics of file access in recent periodic.According to the size of the file relative to the BLOCK,the strategy calculates the value of the weight,then find the file heat value of the current cycle.Finally,according to the file heat value set different heat threshold,the interval between different heat threshold corresponding to different number of replica.According to the corresponding of the heat value and the heat threshold,the number of replica is determined,then compared with the number of existing copies of the replica,then the replica will be adjusted.The simulation results show that the strategy can effectively improve the user's access speed and reduce the average response time.The third strategy is fast replica recovery strategy for load balancing.In the cloud storage,system node failure is a normal phenomenon,if not promptly restore the missing replicas,and if immediately restore the full missing replicas will affect the system performance.Fast replica recovery strategy based on the heat of the replicas in the failure nodes,the replicas with high heat have priority on recovery.Multiple source nodes contain pending recovery replicas,so the node throughput and the response time of file access in recent statistical period are used to calculate the load value of nodes,then the source nodes with minimum load value are selected as the optimal source node.The selection of target node to be restored is based on the double loop structure.The optimal target node is near to the optimal source node,but not a source node of another replica.The simulation results show that the strategy can improve the reliability of the cloud storage system,and has a smaller impact on the load of the cluster,and improves the response speed of requests from users.
Keywords/Search Tags:Cloud storage, Replica strategy, Load balance, HDFS
PDF Full Text Request
Related items