Font Size: a A A

Real-time Performance Monitoring And I/O Performance Optimization Research On Hadoop Cluster

Posted on:2016-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhuFull Text:PDF
GTID:2428330473465676Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,as a representation of massive data processing technology,Hadoop gets more and more attention.Based on the advantages of its open source framework for distributed systems,Hadoop implements Hadoop distributed file system and Map Reduce distributed computing framework.Its high scalability,high fault tolerance and low cost make it more extensively used in data centers,social media and log anaylis and other big data applications.Today,in many companies Map Reduce nodes are deployed in more than one thousand,some even thousands.Faced with so much the number of nodes,especially in the face of the presence of a cluster node performance problem,it is difficult to manage.So how to set up a Hadoop cluster node cluster monitoring system performance monitoring to ensure the normal and efficient cluster operation is very important.Although there are lots of third-party monitoring tools to achieve a Hadoop cluster for monitoring.However,due to the limitations of monitoring indicators lead to inadequate monitoring granularity,or the collection of the data analysis and presentation can't be done simultaneously and that is not real-time monitoring.Therefore,real-time performance monitoring system has become one of the key Hadoop cluster research.This article is a supplement to the Ganglia,which is a real-time monitoring system.Using JMX interface which Hadoop supports to get more monitoring indicators,we can achieve full performance monitoring of Hadoop cluster when running tasks.And we the integrate Nagios as a warning alarm module to warn the fault and error of the cluster.Also using Mongo DB to replace RRD which is the traditional database of Ganglia as database of the real-time performance monitoring system,to extend the function that the monitoring data can be preserved for a long time,for keeping outdated data to analysis in future to generating reports and making optimization decision.In this paper,we use the monitoring system to monitor the target data of three Join tasks(Map Join,Reduce Join,Semi Join),and build the formulas for I/O cost of the whole Map Reduce task,which provides a good perspective to analysis I/O cost of Map Reduce Internal framework.Using data collected as a basis to assess the efficiency from the monitoring view,which reflects the value of practical application of the real-time monitoring tool.
Keywords/Search Tags:Hadoop, Ganglia, Nagios, Mongo DB, Hadoop Join, Hadoop I/O, performance optimization
PDF Full Text Request
Related items