Improvement And Implementation Of The Default Replica Selection Mechanism Based On HDFS

Posted on:2018-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhao

Full Text:PDF

GTID:2348330512482138

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of human society and the increase in the use of technology in our daily lives,massive amount of data is being generated by everything around us at all times.The traditional data processing methods are no longer suitable for the analysis of massive data,that's where Hadoop steps in to provide a solution to this problem.Hadoop has two core components namely MapReduce and Hadoop Distributed File System(HDFS).MapReduce has the ability to deal with massive data analysis and HDFS stores and manages massive data.The replica selection mechanism of HDFS directly affects the reliability,availability,balance and reading efficiency of data.The default replica selection mechanism of HDFS has a certain randomness when choosing replica's location,it leads to the problem of HDFS data imbalance and Hadoop cluster load imbalance.This paper seeks to improve the HDFS default replica selection mechanism,which mainly includes the following aspects:Firstly,this paper presents five factors such as the current CPU usage,memory usage,disk 10 usage,disk usage and bandwidth usage of data node to describe its load conditions in the cluster.These five factors laid the foundation for the quantization of the data node's load.We have furtherly given different weights according to the influence of each factor on the data node load.Moreover,the value of data node load is quantified in this paper.Secondly,this paper has analyzed the principle of DataNode to NameNode periodic heartbeat mechanism.It is used in reporting the load factors of the data node to NameNode that can grasp the load status of the whole cluster data node.Based on the analysis of default replica selection mechanism of HDFS,an improved method has been proposed for the replica's selection mechanism aiming at its shortcomings and according to the load conditions of all data nodes,racks and the average load conditions of the cluster.Finally,the modified HDFS source code is compiled and the Hadoop cluster environment is built.This paper has verified the improved HDFS replica selection mechanism with less than three replicas,precisely three replicas and more than three replicas' placement.The experimental results showed that improved replica selection mechanism of HDFS can select the best replica location to ensure data reliability,availability and balance the distribution of data in the cluster.It has also improved the speed of reading and writing data and the load balancing of cluster according to the data node's load conditions.

Keywords/Search Tags:

HDFS, Replica selection, Heartbeat, Load balancing

PDF Full Text Request

Related items

1	Research On Data Balancing Placement Of HDFS
2	The Research For Replica Strategy Using Distributed Parallel File System HDFS
3	Research On Replica Selection Strategy And Replica Management Startegy Of Heterogeneous Storage HDFS
4	The Research On Data Replica Management Strategy In Cloud Computing
5	Research And Implementation Of HDFS Replica Management Tool Based On File Access Heat
6	Research On The Strategy Of Replica Management In HDFS
7	Research Of Replica Management Mechanism For Integration Of Cloud-P2P Computing
8	Research On Key Technology Of Cloud Storage Based On Hdfs
9	Research On HDFS Replica Placement Management Policy And Retrieval Algorithm In Heterogeneous Storage Environment
10	Research On Efficient Replica Management Strategy In Cloud Environment