Font Size: a A A

Research And Implementation Of Hadoop Load Balancing Strategy In Heterogeneous Environment

Posted on:2019-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z H TengFull Text:PDF
GTID:2348330545455584Subject:Cryptography
Abstract/Summary:PDF Full Text Request
With the development of the data produced in people's production and life are growing by the way of explosion.The traditional technology can not meet the big data processing requirements.Hadoop is widely used as a big data processing tool.While the Hadoop default task allocation and replica placement strategy is aimed at the homogeneous environment.There are differences between the performance of nodes in heterogeneous environment.Therefore it is easy to cause unreasonable task assignment and uneven data placement,which may cause the loading imbalance of system.In view of the default strategy that is likely to cause loading imbalance,the task allocation strategy and non-random replica placement strategy based on heterogeneous environment are proposed in this thesis.In order to dynamically adjust the variable parameters in the strategy for different application scenarios and to observe the load status of the cluster in real time.The monitoring system for dynamic configuration and performance visualization of parameters is designed.The specific work is as follows:1.MapReduce task allocation strategy based on heterogeneous environment is proposed.For the new task allocation strategy of MapReduce module,it is stipulated that tasks should be assigned new one for node according to the load of node.When evaluating the load of nodes,heterogeneous factors should be considered and different evaluation criteria should be adopted for different nodes.And disk,memory,CPU and other factors are added to eliminate the impact of heterogeneous.Through experiments,it can be verified that in heterogeneous Hadoop cluster environment,the task execution speed is improved by 6%-10%.The system is more efficient and the task allocation is more balanced,so the load of the cluster is more balanced.2.A non-random copy placement strategy based on heterogeneous environment is proposed.The method of random selection of replica placement nodes in the Hadoop original replica placement strategy is modified to select the nodes based on the load of the nodes.In the process of judging the load of nodes,the differences of node performance are taken into consideration.Therefore,the method that the node occupies the proportion of the total resource of the cluster judges the performance of the node,namely consider the factor of heterogeneity.It can be seen from the experiment that the new strategy can allocate the corresponding proportion data according to the node performance,the loading on each node is relatively balanced.3.Based on the above two strategies,this thesis has been implemented a system that can change the strategy parameters and the visualization of cluster performance according to the application scenarios.The parameters in the improved policy can be adjusted based on specific application scenarios such as the different resources the task required,and you can visualize the effect chart to observe the cluster load,as a guidance to manually adjust the parameters.
Keywords/Search Tags:Loading balance, Heterogeneous cluster, Replica placement, Hadoop
PDF Full Text Request
Related items