Font Size: a A A

Research On Dynamic Management Of Data Replicas In Heterogeneous Hadoop Cluster

Posted on:2016-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2308330479476626Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Replica management strategy is one of the most important research areas in distributed file system under cloud computing environment. A reasonable number of replicas and replica placement strategy can not only provide more reliable data access services to users, but also improve load balancing and computational efficiency of cloud computing platform.Firstly, the distributed file system HDFS under Hadoop platform was studied, which introduce the detail of replica management strategy, Analysis of the advantages and disadvantages of different replica management strategy.at the same time, block balancing strategy is analyzed in detail.Secondly, the limitation of default block balancing strategy is analyzed, after then a block balancing algorithm suitable for heterogeneous environment is put forward. The algorithm calculate theoretical space utilization of each node based on the performance, storage space and so on of individual nodes, convert the input threshold into parameter of each node, Realizing the block balancing in heterogeneous environment. In this paper, the source code of Hadoop is modified and recompiled, then deployment it to build a test environment to experiment. Experiment results show that the algorithm makes the distribution of data more balanced in heterogeneous environment, improving the overall performance of the cluster in a certain extent.Thirdly, this paper conducts a research on the replica decision algorithm and points out that great differences exists in current popular file and that the visiting of hotspot will affect the efficiency of work. However, based on the current hotspot, the replicas decision strategy is not only lagging but also is influenced by the fluctuation of flash crowd. Therefore, the decision of replicas should base on the future popularity. A prediction model is established based on gray prediction and Markov model is adopted to modify the prediction deviation caused by sudden visiting. Moreover, the limited access of service model is founded on the basis of file hotspot prediction to decide the replicas which meet the users’ demand.Finally, this paper conducts an optimization research on the placement strategy of new replicas during the process of replica adjustments. Currently, most algorithms either aim to only single‐objective optimization or heavily depend on the export’s knowledge and experience with their complex models. A new replica placement strategy is proposed based on multi‐objective optimization. Firstly, many optimization goals such as building network flow, load balancing about performance and disk space are set. Secondly, multi‐objective algorithm NSGA‐II is introduced to analyze and solve the model. Finally, the best decision‐making strategy is put forward based on individual density in solution space to find the final placement scheme. Simulation results show that the algorithm has good convergences and verify the effectiveness of the algorithm. Then, the source code of Hadoop is modified. The proposed replica decision algorithm and replica placement strategy is experimented and analyzed. Experiment results show that the improved algorithm can effectively reduce access conflict, improve system throughput and reduce the network load during replica placement and possess a better effect on the overall performance optimization.
Keywords/Search Tags:Hadoop, HDFS, replica management strategy, load balance, replica number decision algorithm, replica placement strategy
PDF Full Text Request
Related items