Font Size: a A A

Research On Dynamic Management Of Data Replicas In Heterogeneous Hadoop Clusters

Posted on:2019-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330545960433Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
It is an important problem to store and manage data and the replicas in HDFS.The assumption made by the default data placement strategy on homogenous Hadoop clusters has some limitations in heterogeneous environments as it may incur additional costs and reduce MapReduce performance.In this thesis,we design a dynamic data replica placement strategy that employs the gray prediction model to predict the hotness of data.The proposed strategy determines the number of replicas for data blocks in real-time,considering the hotness of each data block and the performance characteristics of each node in a heterogeneous cluster,and adaptively adjusts the number of replicas based on their corresponding data hotness.This thesis has the following contents:(1)For the data hotness prediction problem,we analyze a large number of access requests in historical data and observe some unique characteristics.We use a gray prediction model to predict the data blocks over a certain time period to obtain the prediction results of data block hotness in the next time period.(2)To address the limitations of static data replica methods,we adopt a real-time hotness-based replica method that combines a dynamic weight and the current hotness of the data block to decide the replica number of data blocks.(3)On heterogeneous clusters,we propose a dynamic data placement strategy,which considers the characteristics of each node,including computing power,disk storage space,IOPS(Input / Output Operations Per Seconds),and so on.This strategy determines on which node a new copy is placed and when.(4)The proposed solution is tested and evaluated in a simulated Hadoop system.The results show that the proposed dynamic data replica placement strategy outperforms the default static data placement strategy in terms of execution time,response time,and network access contention.
Keywords/Search Tags:Hadoop, heterogeneous cluster, data replica management strategy, dynamic data replica placement, grey prediction
PDF Full Text Request
Related items