Font Size: a A A

Research On Data Balancing Placement Of HDFS

Posted on:2015-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2308330503955613Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Along with the Internet scale and information technology to improve, for massive big data storage, people depend more and more on cloud storage distributed file system. HDFS is the representative of the distributed file system, it uses the replica management strategies to improve the availability and fault tolerance of clusters. However, the research on the replica management strategies is in the initial stage, the replica creation, maintaining replica consistency and load balancing are an important research topic in the current field of computer storage technology.Replica creation including the number of copies, replica placement location and time of create copies. In the process of the HDFS, replica placement rules can be realized by replica placement strategy. However, the default replica placement strategy assumes that the HDFS cluster is isomorphic and randomly selecting datanode to save replica. It does not consider datanode’s available storage space, even if there is a datanode’s available storage space is much smaller than the other datanodes, this datanode are still continue to store replica probability. If a client belongs to the HDFS cluster, the default strategy will storage the first replica on the client. If this client always uploads files, the client’s available storage space will become very small soon. The default strategy is caused by the difference of available storage space of datanodes and shortage available storage space datanodes which may store the replica or execute MapReduce task failure. The default strategy cannot balance datanode’s net load, the default strategy can not store replica on low net load datanodes and reduce the waiting time of storing data block replica.Therefore, this paper focuses research to the HDFS replication placement strategy as scientific problem, the main work and innovations are as follows:(1)An available storage space sensitiveplacement strategy is proposed. In view of the deficiency of the default HDFS replica placement strategy does not match datanode’s available storage space and data block number, improved strategy gets datanode’s available storage space and datanode’s currently access connection number, according to the evaluation of computing nodes mathematical model of the available storage space and the current access connection number, the datanode evaluation value is the standard of namenode select datanode, it chooses the best datanode from the cluster. The experimental results show that the available storage space sensitive replica placement strategy realizes matching datanode’s available storage space and block replica number, avoiding the shortcomings of available storage space datanode, greatly reduces the possibility of writing block replica and executing MapReduce task fail due to available storage space is too small.(2) A net load sensitive replica balancing strategy is proposed. When lots of files reading and writing access, default strategy can not share the net data traffic to other datanodes and balance net load. Improved strategy periodically detects sending and receiving data number of each datanode within a period of time. It considers datanode’s net load and available storage space two targets. When high net load datanodes’ average available storage space exceeds 5G than low net load datanodes it selects max available storage space datanode to storage replica. If not more than 5G, it randomly selects a low net load datanode. The experimental results show that the net load sensitive replica balancing strategy compares average available storage space low net load datanode and high net load datanode to balance net load and reduce the storage data block waiting time due to the high net load.
Keywords/Search Tags:HDFS, block replica, available storage space, net load
PDF Full Text Request
Related items