Font Size: a A A

Research On Optimization Of Data Load Balancing In Hadoop Clusters And Application Of Haddoop Platform

Posted on:2019-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:L C YuFull Text:PDF
GTID:2428330566468736Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Under the environment of big data,the traditional stand-alone processing model has been unable to effectively meet the growing demands of big data.As a result,cluster architecture arises at the historic moment and gradually takes the place of it.Compared with the applications of the traditional stand-alone model,cluster-based applications are affected by many factors in terms of performance and efficiency.Load balancing is one of the most important influencing factors.At present,Hadoop has become a great tool for the development and application of big data.Load balancing plays a crucial role in the performance of the applications of Hadoop clusters based on cluster architecture.The rationalized load balancing strategy can effectively improve the performance of the applications of Hadoop clusters,and it can also bring good user experience to the users of Hadoop.Load balancing has become a research hotspot in the environment of clusters with the development of cluster application.Many scholars at home and abroad have carried out a lot of research work on load balancing in Hadoop clusters,including task scheduling strategies in Hadoop clusters,data load balancing in Hadoop clusters and so on.However,with the continuous increase of the Hadoop system and the increasingly complex environment of application,there are many influencing factors in the technology of load balancing of Hadoop in the cluster environment that need to be considered,and the formulation of many policies needs to be continuously optimized.This thesis focuses on the specific optimized work of the data load balancing technologies of the two cores which are HDFS and MapReduce in the Hadoop system under the cluster environment.At the same time,in order to master the practical application of Hadoop in the field of big data more skillfully,this thesis combines the data of the forum module of the ecommerce platform to carry out the corresponding research on the application of Hadoop platform,and tests and analyzes the previous work on load balancing optimization of Hadoop through the application of platform.The key tasks of the project are as follows.(1)This thesis analyzes the threshold-setting characteristics of default loadbalancing strategy of HDFS,as well as some existing algorithm strategies for optimizing default policies.Based on the analysis of the overall architecture principle of HDFS,and the processing and processing objects of the function of storage in its application,the prediction model is introduced to predict the attribute of the files which are the processing object of node.At the same time,the time-impacting factor of the dynamic model of threshold is analyzed by the characteristics of the nodes which are considered and the predicted value of the attributes of file,and the result of the time-impacting factor is replaced in the established dynamic analysis model of threshold for the final calculation of threshold.Finally,the resulting threshold is assigned to the load balancing strategy for equalization optimization.After experimental analysis,it is concluded that this optimization technology significantly improves the efficiency of the storage of HDFS clusters.(2)This thesis analyzes the parallel computing framework of MapReduce and its operation principle,and makes an in-depth analysis of the existing work of data load balancing of MapReduce,and optimizes the load balancing of Reduce in the MapReduce cluster environment.In order to implement Reduce-based dynamic load balancing,a dynamic light-weight partitioning strategy is adopted.The implementation of this strategy is mainly carried out from these aspects which are the dynamic design of sampling scale combining with load information of Reducer,the light-weight design of sampling method,the determination of the number of Reducer combining with the sampling data and the performance of node,the formulation of the partitioning strategy based on analysis of the results of sampling and loading information of Reduce.After experimental analysis,it is concluded that this optimization technique significantly improves the performance of parallel computing in MapReduce clusters.(3)This thesis researches on post classification based on the optimized Hadoop platform with large capacity and complex data set of posts in the e-commerce forum module.In order to complete the classification of posts,this thesis analyzes the general process of post classification,the specific implementation process of post classification in Hadoop platform and the construction process of Hadoop clusters.Finally,this thesis proves the advantages of the Hadoop platform in the application and development of the field of large data,and proves the effectiveness of the optimization of data load balancing in the Hadoop clusters,which are based on the analysis of the effects of classification and the processing-time efficiency of the platform.
Keywords/Search Tags:cluster, HDFS, MapReduce, load balancing, post classification
PDF Full Text Request
Related items