Research On Distribute Storage Of Replicas Based On Hadoop

Posted on:2016-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:H C Feng

Full Text:PDF

GTID:2308330461487802

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, the amount of the resulting data increased exponentially. How to handle the huge amounts of data has become the hot research topic in the field of computer. HDFS(Hadoop Distributed File System) is a distributed file system which be designed to be suitable for the general hardware. HDFS has high fault tolerance, and can be deployed on the cheap machine. HDFS can provide high throughput data access, very suitable for the application of large data sets. HDFS eases part of POSIX constraints, can be read in the form of stream data in a file system.But as an instantly developing and improving of cloud storage system, HDFS still has some shortage in the management of data storage. The number of the duplicate is fixed in the default management strategy of HDFS, and the storage of the data is random selected. This may lead to cluster load imbalance, thus affecting the performance of the whole cluster.The paper analyzed the hadoop distributed file system, and combined the relevant knowledge and theory of cloud storage, finally imporved the existing management strategy of HDFS.The paper mainly includes the following aspects:(1)Improved the default replica placement strategy of HDFS. The default data placement strategy of HDFS’s duplicates is placed randomly. However, the performance of the node is not consistent; some of the cluster nodes are idle; some nodes are in the load condition; some new joined the cluster nodes can cause uneven load in the condition of random selection. The Improved replica placement strategy analysed the performance and the load of the nodes, and selected the optimal node for the placement of replicas according to the value. The strategy realized the load balancing of cluster, and improve the performance of the cluster.(2)Improved the default replica creation strategy of HDFS. The improved strategy through the analysed the heat and access to the file recently, according to the trend of heat value and access dynamically adjust the number of copies, and calculated the default number of the replica by the availability of the replica, finally make a further improvement of the system performance and efficiency.(3) Set up hadoop distributed file system environment, and verificate the improved replica placement strategy and creation strategy through the experiment. The experimental results show that the improved strategy make full use of the performance of each node cluster; improve the efficiency of the system; better realize the load balancing of cluster.

Keywords/Search Tags:

cloud storage, replica placement, replica creation, load balancing

PDF Full Text Request

Related items

1	Research On Efficient Replica Management Strategy In Cloud Environment
2	Research And Experiment About The Data Replica Placement Algorithm In Cloud Storage System
3	Research Of Replica Management Mechanism For Integration Of Cloud-P2P Computing
4	Research On Optimization Of Big Data Storage Replica Strategy In Cloud Environment
5	Reserach And Implementation Of Replica Management Strategy In Cloud Storage Environment
6	The Research On Data Replica Management Strategy In Cloud Computing
7	Research On Strategy Of Data Replica Placement For Geo-distributed Cloud Storage Services
8	Research On Replica Placement And Selection Strategies In Heterogeneous Cluster Storage System For Big Data
9	Research Of Replica Management Mechanism In Cloud Storage System
10	Research On The Strategy Of Replica Management In HDFS