Font Size: a A A

Improvement And Implementation Of The Default Replica Selection Mechanism Based On HDFS

Posted on:2018-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2348330512482138Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of human society and the increase in the use of technology in our daily lives,massive amount of data is being generated by everything around us at all times.The traditional data processing methods are no longer suitable for the analysis of massive data,that's where Hadoop steps in to provide a solution to this problem.Hadoop has two core components namely MapReduce and Hadoop Distributed File System(HDFS).MapReduce has the ability to deal with massive data analysis and HDFS stores and manages massive data.The replica selection mechanism of HDFS directly affects the reliability,availability,balance and reading efficiency of data.The default replica selection mechanism of HDFS has a certain randomness when choosing replica's location,it leads to the problem of HDFS data imbalance and Hadoop cluster load imbalance.This paper seeks to improve the HDFS default replica selection mechanism,which mainly includes the following aspects:Firstly,this paper presents five factors such as the current CPU usage,memory usage,disk 10 usage,disk usage and bandwidth usage of data node to describe its load conditions in the cluster.These five factors laid the foundation for the quantization of the data node's load.We have furtherly given different weights according to the influence of each factor on the data node load.Moreover,the value of data node load is quantified in this paper.Secondly,this paper has analyzed the principle of DataNode to NameNode periodic heartbeat mechanism.It is used in reporting the load factors of the data node to NameNode that can grasp the load status of the whole cluster data node.Based on the analysis of default replica selection mechanism of HDFS,an improved method has been proposed for the replica's selection mechanism aiming at its shortcomings and according to the load conditions of all data nodes,racks and the average load conditions of the cluster.Finally,the modified HDFS source code is compiled and the Hadoop cluster environment is built.This paper has verified the improved HDFS replica selection mechanism with less than three replicas,precisely three replicas and more than three replicas' placement.The experimental results showed that improved replica selection mechanism of HDFS can select the best replica location to ensure data reliability,availability and balance the distribution of data in the cluster.It has also improved the speed of reading and writing data and the load balancing of cluster according to the data node's load conditions.
Keywords/Search Tags:HDFS, Replica selection, Heartbeat, Load balancing
PDF Full Text Request
Related items