The Research For Replica Strategy Using Distributed Parallel File System HDFS

Posted on:2014-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:C F Huang

Full Text:PDF

GTID:2268330425451887

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, with the further development of science and technology, the data scale of global is increasing rapidly. Web2.0is more focus on user interaction, it changed the users from the readers to the creators for internet..due to the massive information, traditional storage systems can not meet the requirements of high-speed growth of the amount of information. There are bottlenecks in capacity and performance requirements, such as the limited from number of hard drives and servers.HDFS (Hadoop Distributed File System) is different from the traditional distributed parallel file system, Running on inexpensive machines, They are new distributed file systems with high throughput, high fault tolerance, high reliability. With the distribution of data storage and management functions and provide high-performance data access and interaction.In distributed parallel file system HDFS, a replica is the the HDFS important part, the replica technical coordinates the Internet node resources to complete efficient and workload of the larger task. To achieve this task by replica placement, replica selection, replica adjustment,etc. To improve the effective transmission of data between resource nodes.This paper first introduces the current replica management strategy, summarizes the research achievements of predecessors in the field and their limitations. Then analysis the HDFS system architecture and I/O mechanism, etc to establish the HDFS dynamic replica management model, discuss replica placement and de-duplication.and then, improved algorithm design by replica placement strategy. Proposed a modified replica placement based on distance and load.Put forward balance factor to adjust the proportion of distance and load to meet the requirements of the different users. At the same time,According to the demand of the replica adjustment stage, improving replica de-duplication strategy, proposing the replica evaluation function and a de-duplication strategy based on replica valuation. Finally, verify replica strategies validity through the simulation experiment and compared to the HDFS default replica strategy.The main contribution of this paper is to:1)Analysis of the difference between HDFS with the traditional distributed systems, focus on a comparative analysis with the GFS, analysis of both design ideas and principles, compare similarities and differences of the replica management strategy. Facts show that HDFS is GFS simplified design, with a more flexible operational.2)Propose the replica placement strategy based distance and load information. It changed random memory algorithm in HDFS, considering a replica size, the transmission bandwidth, and node load three influencing factors, by calculating the utility value of a node to select best node. Finally, validating the superiority of the algorithm in the load balancing through Simulation experiments.3)Propose the de-duplication strategy based on the valuation. When the replica request for a write operation, Namenode obtains a set of Datanodes randomly. Selecting a node and writing data.If the selected node has amount of replicas and heavy load, the performance can not be effective; The HDFS default replica adjustment strategies ignore the status of the nodes. Improved strategies phase out the minimum value replica by calculating a replica value to free up storage space. Simulation results show that this strategy has higher performance compared to the the HDFS default strategy in large file writing test.

Keywords/Search Tags:

HDFS, Replica Management, Replica Value, Load Balance

PDF Full Text Request

Related items

1	Research On HDFS Replica Placement Management Policy And Retrieval Algorithm In Heterogeneous Storage Environment
2	Research On Dynamic Management Of Data Replicas In Heterogeneous Hadoop Cluster
3	Research On Replica Selection Strategy And Replica Management Startegy Of Heterogeneous Storage HDFS
4	Improvement And Implementation Of The Default Replica Selection Mechanism Based On HDFS
5	Research On The Strategy Of Replica Management In HDFS
6	The Research On Data Replica Management Strategy In Cloud Computing
7	Research And Implementation Of HDFS Replica Management Tool Based On File Access Heat
8	Research Of Replica Management Mechanism For Integration Of Cloud-P2P Computing
9	Research And Experiment About The Data Replica Placement Algorithm In Cloud Storage System
10	Research On Replica Management Strategy In Heterogeneous Cluster