Font Size: a A A

Research Of Hadoop Cluster System Performance Optimization

Posted on:2014-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:M XiangFull Text:PDF
GTID:2268330425967355Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Cloud computing value in the commercial and scientific has been graduallyrecognized by society. It contributes tremendous amounts of energy in searchengines, Internet application technology and large-scale data computing.Hadoop technology as an open source implementation of cloud computingtechnology, has played a very important role in its development. Now, most ofthe enterprises and scientific research are using Hadoop cloud computingplatform. Hadoop with its simple parallel programming model, large datastorage capacity and efficient computing power provides customers a good userexperience. However, due to the Hadoop development time is relatively short,many aspects of the system can still be refined and make better to improve itssystem performance. Therefore, the research on Hadoop system performance isnecessary and significant.Hadoop system performance parameters and task-level schedulingalgorithm plays an important impact on the system performance, systemparameters related to every stage of the cluster work on the use of systemresources; the task-level scheduling algorithm is the key to task allocation inHadoop work. No uniform model for parameter values determined and taskallocation, they are complex tasks, and now, their research remain basic. Thus,we research Hadoop system performance optimization from these two aspects.This paper focuses on analysising and researching the implementationcapacity of the cluster nodes. To make Hadoop cluster system to cope with theimpact of system performance from varied works and the cluster nodes owndifferences, we design TaskConfigure server and build Hadoop clusterparameters information system to automatically tune cluster configureparameters. According to the cluster load imbalance with current inherenttask-level scheduling allocation method, an adaptive tasks scheduling methodbase on the node capability is proposed. The parameter information systemgenerates cluster system optimized configuration values by nodes resource use efficiency. Then adopt of the classification of the cluster nodes and tasks toassign cluster configuration parameters by its category. Guarantee every node ofcluster could perform tasks in the appropriate configuration parameters.Meanwhile, in order to improve the load balancing of cluster, use nodeperformance, task characteristics, node failure rate to compute node weightsproportionate parameter as basis for tasks amount allocated. and add the loadstate judgment of the node itself, adjust the amount of running tasks adaptivelyon each node. Experiment results show that the total task completion timereduced significantly, the load on each node got more balanced, the noderesource utilization was more reasonable. Make the cluster has a good stabilityand scalability.
Keywords/Search Tags:Cloud computing, Hadoop cluster parameter information system, TaskConfigure server, Node weight proportional parameter, Adaptive tasksscheduling algorithm
PDF Full Text Request
Related items