Font Size: a A A

Research On Replica Selection Strategy And Replica Management Startegy Of Heterogeneous Storage HDFS

Posted on:2018-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:S S YangFull Text:PDF
GTID:2348330563952190Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,big data has the characteristics of variety,velocity,volume,variability and complexity.Hadoop Distributed File System(HDFS)is one of the widely used big data storage system.HDFS is designed to run on common commercial hardware devices,using replica data technology to ensure data security and reliability.How to choose the best replica for the upper application,and how to manage the replica,is the key to HDFS replica research.HDFS default replica strategy is oriented to the isomorphic hardware,but with the HDFS cluster hardware iterative update,isomorphic hardware gradually evolved into heterogeneous hardware,the use of SSD can make up for memory and HDD performance gap,but SSD are smaller and more expensive than HDD,the cost of only using SSD is too expensive.Considering the cost and storage performance requirements of large-scale distributed storage system,the combination of SSD and HDD has become an effective means to realize high-capacity and high-performance data storage.Mixed use of SSD and HDD heterogeneous storage HDFS are common.HDFS default replica selection strategy and replica management strategy in the heterogeneous storage HDFS have many deficiencies.HDFS default replica selection strategy is the nearest replica selection strategy and always selects the nearest Datanode.However,only the network topology distance is considered in the selection process.The selection criteria are single and do not take into account the performance of different storage devices and the load difference of Datanode,easily lead to poor performance or heavy load Datanode to bear the heavy data accesses,resulting in cluster load imbalance,data access inefficient.HDFS uses static replica management strategy,such replica management has low maintenance costs,but the flexibility is poor,the distribution of replicas in the cluster will not change,likely to cause waste of highperformance storage resources.In this paper,based on the deficiencies of HDFS default replica selection strategy and replica management strategy,this paper studies the replica selection and replica management strategy aiming at improving the usage and replica access efficiency of SSD under the HDFS with SSD and HDD.The main contributions of this paper are as follows:(1)Design and implement a replica selection strategy for heterogeneous storage HDFS.This paper analyzes the shortcomings of HDFS default replica selection strategy,summarizes the main factors that affect the performance of data access,including the I/O load of the storage,CPU load,memory load and network topology distance.Analysis of the Datanode access performance evaluation model,as much as possible in the choice of replica of the selected value of the higher Datanode to provide services to enhance the heterogeneous storage HDFS data access performance.(2)Design and implement a dynamic replica management strategy for heterogeneous storage HDFS.In this paper,we propose a dynamic two-level replica management strategy for heterogeneous storage HDFS.The first level performs a replica adjustment within Datanode according to the data access heat,adjusts replica with higher access heat to the SSD,of the data replica delay to the HDD;the second level performs discrete particle swarm algorithm based on the genetic algorithm idea for replica adjustment among Datanode,by giving priority to the high heat access data stored in high-performance storage to optimize the cluster data distribution of the distribution,the initiative to adjust the use of Datanode different storage to enhance the use of high-performance storage.(3)Build heterogeneous storage HDFS simulation environment and completed the performance test and analysis.In the open source simulation platform CloudSim on the basis of the expansion of HDFS.The experimental results show that the HDFS replica selection strategy proposed in this paper of access speed is about 17% higher than that of HDFS's default replica selection strategy.The dynamic two-level replica management strategy proposed in this paper is more effective than HDFS static replica management strategy a replica of the read performance is improved by at least 32%.
Keywords/Search Tags:HDFS, replica selection, replica management, CloudSim
PDF Full Text Request
Related items