Font Size: a A A

Research On The Strategy Of Replica Management In HDFS

Posted on:2016-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ChenFull Text:PDF
GTID:2308330479984817Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As a foundation of the cloud computing, Cloud Storage plays an increasingly important role with the advent of the era of big data. Distributed architecture is adopted in the Cloud storage to cope with the huge amounts of data storage. How to enhance the service ability of cloud storage in reliability and high performance has been a hot research topic. The security of data in the cloud storage depends on its replication technology. Replication technology is a kind of data management mechanism, which replicates the stored data into multiple copies and distributes them among several nodes to improve the reliability, load balance, data access efficiency of the whole cloud storage system. This thesis aims at improving the service ability of cloud storage, dealing with the replica creation of Hadoop Distributed File System(HDFS) and placement strategy.According to the characteristics of the file access in a storage system, this thesis proposed an algorithm for dynamic replica creation based on the access frequency. Aiming at solving the shortcomings of the replica creation method in HDFS, replica creation algorithm based on file access frequency takes access frequency of each file and access time difference into consideration, so that the whole HDFS cluster can dynamically adjust the number of replications of each file. For the files with high access frequency, more copies can be created according to their characteristics. The extra replications can better distribute access request, so that the whole system has no hot spots fault, thus effectively reducing the probability of single point failure of system.In addition, as there are multiple replications, access requests from users can choose the nearest replica of the data, thus effectively reducing network latency. As for those data with low access frequency, this algorithm can allocate fewer replications without affecting the availability of the data, so that it can effectively reduce the load in the cluster.This thesis proposed a SVM Replica Placement Model(SRPM). A rack-aware replica placement strategy was adopted in the HADOOP distributed file system to cope with the storage of very large scale data and improve the fault tolerance performance. However, the HDFS does not synthetically consider the difference of each node server, which could result in load imbalance of clusters. Meanwhile, HDFS chooses remote rack node to place replica randomly, which may lead to a long distance between nodes, so that the transmission of data between nodes would consume a lot of time. To solve the aforementioned problems, SRPM finds the best placement node for replica by comprehensively considering of each node, node’s hardware performance and network distance of nodes. The experiments showed that SRPM effectively improves the load balancing in HDFS compared with the existing replica placement strategy.
Keywords/Search Tags:Cloud storage, Replica strategy, Distributed File System, Load Balancing, Support vector machine
PDF Full Text Request
Related items