Font Size: a A A

Research On Distribute Storage Of Replicas Based On Hadoop

Posted on:2016-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:H C FengFull Text:PDF
GTID:2308330461487802Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the amount of the resulting data increased exponentially. How to handle the huge amounts of data has become the hot research topic in the field of computer. HDFS(Hadoop Distributed File System) is a distributed file system which be designed to be suitable for the general hardware. HDFS has high fault tolerance, and can be deployed on the cheap machine. HDFS can provide high throughput data access, very suitable for the application of large data sets. HDFS eases part of POSIX constraints, can be read in the form of stream data in a file system.But as an instantly developing and improving of cloud storage system, HDFS still has some shortage in the management of data storage. The number of the duplicate is fixed in the default management strategy of HDFS, and the storage of the data is random selected. This may lead to cluster load imbalance, thus affecting the performance of the whole cluster.The paper analyzed the hadoop distributed file system, and combined the relevant knowledge and theory of cloud storage, finally imporved the existing management strategy of HDFS.The paper mainly includes the following aspects:(1)Improved the default replica placement strategy of HDFS. The default data placement strategy of HDFS’s duplicates is placed randomly. However, the performance of the node is not consistent; some of the cluster nodes are idle; some nodes are in the load condition; some new joined the cluster nodes can cause uneven load in the condition of random selection. The Improved replica placement strategy analysed the performance and the load of the nodes, and selected the optimal node for the placement of replicas according to the value. The strategy realized the load balancing of cluster, and improve the performance of the cluster.(2)Improved the default replica creation strategy of HDFS. The improved strategy through the analysed the heat and access to the file recently, according to the trend of heat value and access dynamically adjust the number of copies, and calculated the default number of the replica by the availability of the replica, finally make a further improvement of the system performance and efficiency.(3) Set up hadoop distributed file system environment, and verificate the improved replica placement strategy and creation strategy through the experiment. The experimental results show that the improved strategy make full use of the performance of each node cluster; improve the efficiency of the system; better realize the load balancing of cluster.
Keywords/Search Tags:cloud storage, replica placement, replica creation, load balancing
PDF Full Text Request
Related items