Font Size: a A A

The Establishment And Optimization Of The Distributed JobTracker Node Model In Cloud Computing

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:H L YangFull Text:PDF
GTID:2308330485490007Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Cloud computing is the fourth IT industrial revolution with the the development of large-scale computer and personal computer, Google first defined and developed cloud computing. As an open source model, Hadoop based on Java has ran the intensive distributed applications and analyzed the open source of distributed processing large data, but single point problem is always the performance bottleneck of Hadoop. The single node namenode has been optimized for storage model architecture in HDFS, Hadoop-2.0 proposed a multi-node high availability solution, but optimized for the single node of Job Tracker has not been given the corresponding solutions. This paper hopes to improve the failure of Job Tracker in traditional model by building a distributed model, which can automatically avoid the failure of the single node of Job Tracker.The main contents and contributions of this paper are as follows:Analysis the optimization of single node model,the scheduling algorithm and the load balancing algorithm of the predecessors. Firstly we have set up the distributed Job Tracker node model, and optimized the communicate model by the Dijkstra algorithm, optimized the job scheduling model based on Page Rank, finally optimized the load of the node based on the Counting Bloom Fliter algorithm. After analyzing the communication mode of the distributed Job Tracker node model and the related scheduling optimization, the small Hadoop experimental cluster has been set up to verify the results.The experimental results show that, comparing with the single Job Tracker model and the distributed model in cluster, in cluster downtime, distributed Job Trackder node model has higher reliability, the communication method based on Dijkstra algorithm can select the Job Tracker node more quickly; for the scheduling algorithm with job has dependencies, the improvement Page Rank algorithm can improve the overall processing time; for the improved load balancing algorithm, a duplicate from the storage has been optimized, thereby improving the duplication of data storage. Comparing with the performance of clusters, can be seen from the experimental results, the optimization under the distributed node model is mainly due to the specific optimization and improvement of job, the performance is not as good as the original cluster, but in the Job Tracker node downtime, the distribute model has improved the the safety and reliability of the cluster, and there is some significance for the job of special scene.
Keywords/Search Tags:Hadoop, Map Reduce, Job Tracker, Distributed Communication, Scheduling Algorithm, Load Balancing
PDF Full Text Request
Related items