The Establishment And Optimization Of The Distributed JobTracker Node Model In Cloud Computing

Posted on:2017-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:H L Yang

Full Text:PDF

GTID:2308330485490007

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cloud computing is the fourth IT industrial revolution with the the development of large-scale computer and personal computer, Google first defined and developed cloud computing. As an open source model, Hadoop based on Java has ran the intensive distributed applications and analyzed the open source of distributed processing large data, but single point problem is always the performance bottleneck of Hadoop. The single node namenode has been optimized for storage model architecture in HDFS, Hadoop-2.0 proposed a multi-node high availability solution, but optimized for the single node of Job Tracker has not been given the corresponding solutions. This paper hopes to improve the failure of Job Tracker in traditional model by building a distributed model, which can automatically avoid the failure of the single node of Job Tracker.The main contents and contributions of this paper are as follows:Analysis the optimization of single node model,the scheduling algorithm and the load balancing algorithm of the predecessors. Firstly we have set up the distributed Job Tracker node model, and optimized the communicate model by the Dijkstra algorithm, optimized the job scheduling model based on Page Rank, finally optimized the load of the node based on the Counting Bloom Fliter algorithm. After analyzing the communication mode of the distributed Job Tracker node model and the related scheduling optimization, the small Hadoop experimental cluster has been set up to verify the results.The experimental results show that, comparing with the single Job Tracker model and the distributed model in cluster, in cluster downtime, distributed Job Trackder node model has higher reliability, the communication method based on Dijkstra algorithm can select the Job Tracker node more quickly; for the scheduling algorithm with job has dependencies, the improvement Page Rank algorithm can improve the overall processing time; for the improved load balancing algorithm, a duplicate from the storage has been optimized, thereby improving the duplication of data storage. Comparing with the performance of clusters, can be seen from the experimental results, the optimization under the distributed node model is mainly due to the specific optimization and improvement of job, the performance is not as good as the original cluster, but in the Job Tracker node downtime, the distribute model has improved the the safety and reliability of the cluster, and there is some significance for the job of special scene.

Keywords/Search Tags:

Hadoop, Map Reduce, Job Tracker, Distributed Communication, Scheduling Algorithm, Load Balancing

PDF Full Text Request

Related items

1	Based On Feedback Scheduling Algorithms For Dynamic Load Balancing In The Heterogeneous Environment Of Hadoop Design And Implementation
2	Research On Dynamic Load Balancing Method Of Distributed Crawler System
3	Research On Load Balancing Algorithm For Scheduling Based On Hadoop
4	Based On The Agent - Aid Map - Reduce Architecture Research And Design Load Balancing Optimization
5	Research On Hadoop Distributed System Of Scheduling Alogrithm
6	Research On Scheduling Algorithm Based On Hadoop
7	Optimization And Research On Reduce Task Scheduling Strategy And Data Skew On Hadoop
8	The Research On Distributed Task Scheduling Algorithms Based On Hadoop Platform
9	Design And Implementation Of Distributed Query Algorithm Processing Communication Data Based On Hadoop
10	Research And Design On An Efficient Load Balancing Algorithm