Font Size: a A A

The Research And Implementation Of Hadoop Cluster Monitoring System Based On Ganglia

Posted on:2018-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:X F WeiFull Text:PDF
GTID:2348330518999380Subject:Engineering
Abstract/Summary:PDF Full Text Request
AS an open source distributed computing framework,Hadoop is widly used in fields of big data analysising and processing.In the period of rapid development,the datas Hadoop handles are more and more large and complex,if the cluster fails or even crashs,it will cause irreparable damage,therefore,the stability of Hadoop is concerned.In order to ensure the stability and robustness of a working cluster,it is very important to develop an integrated and completed cluster monitoring system.Based on the open source monitoring frameworks of Ganglia and Nagios,this paper completed the cluster monitoring system,which contains a variety of functions of collecting monitoring data,storaging monitoring data persistently,displaying monidotring data and the alarm mechanism based on threshold.The collecting of data will be done by those open source monitoring frameworks,then the data successfully collected will be storage into data warehouse persistently by the ETL tool,which will facilitate long term storage and diverse fast queries.The function of displaying data is mainly responsible for displaying the real-time and historical monitoring data in the front page using the technology of Java Web,and the alarm mechanism will obtain the change rule of data by statistical analysising all historical data,at the same time,when the values of monitored metrics exceed the threshold limit,related alarm information will be sent out by this mechanism,which will help troubleshoot the problem and optimize the cluster.In the period of development,the data warehouse is reasonably designed by the segmentation methods of table and partition.Monitoring datas from different hosts and different years will be distinguished,and datas from different months will be divided into different blocks.The good scalability of data warehouse is not only benefical to store large mounts of monitoring data,but also conducive to query various forms of historical data quickly.In the process of statistical analysising large amounts of historical datas,a design scheme is used to assign tasks to each day,which can greatly simplifie the ststistical calculation process and rapidly reflect the change rule of historical datas.Finally,the threshold calculation scheme based on Confidence interval provides strong support for setting appropriate threshold.After a series of test verification,the cluster monitoring system in this paper is proved to be correct and robust sufficiently,which is competent fot the task of monitoring cluster overall,and is able to achieve the aim of ensuring the stability and robustness of a working cluster.At last,this paper summarized the research process,analysised the merits and demerits of this system,and put forward a series of constructive suggestions for the expansion and perfection of alarm mechanism above the threshold mechanism,including real time monitoring of log information,dynamic prediction of monitoring data,automatic troubleshooting for cluster failures,and so on.These further studies will make the system more intelligent and be applied to a wider range of areas.
Keywords/Search Tags:Hadoop, Ganglia, Nagios, ETL, Confidence interval
PDF Full Text Request
Related items