Font Size: a A A

Research And Application Of Big Data Platform Operation Monitoring System

Posted on:2017-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2308330485458214Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Big data technology, more and more Internet companies began to deploy the project on the cluster. In practical application, cluster is rich in resources, the environment is complex, it’s particularly important to ensure the normal operation of the cluster, so the big data platform operation monitoring system emerges with the tide of the times. Monitoring system is designed to monitor the operation of the cluster, nodes resources and the running condition of the computing tasks on the node, timely detection of anomalies and alarm, to ensure the smooth completion of the scheduling task.In this paper, the key technologies of cluster monitoring are studied and analyzed in depth, in the light of an urgent need for real-time monitoring of cluster health status, in order to solve the limitation and deficiency of the existing system in operation monitoring, the monitoring method for cluster task operation is studied emphatically, the implementation scheme of monitoring agent and monitoring plugin which is based on the SNMP protocol is put forward, and it builds the big data cluster experiment environment, the validity of the scheme is verified by experiments, the results show that the monitor strategy is effective and feasible, and it can ensure the stable operation of the big data platform and meet the actual needs of the cluster monitoring system. The main work and research results of this paper are as follows:(1) Through studying the running status of big data platform, the cluster health indicators system is put forward, and the three levels of monitoring indicators are identified, which are cluster, node performance and task operation.(2) In the light of the characteristics of big data cluster hardware resources rich, the method of monitoring the performance of the cluster is proposed. This method is based on the Icinga monitoring platform, the control of cluster resource and node performance is realized through the monitoring plugin and the NRPE monitoring agent finally, and it also can realize a fault alarm through the mail, SMS and other channels, to achieve the purpose of timely detection of anomalies and handling.(3) According to the characteristics of the distributed architecture of big data clusters, a monitoring scheme for task operation is proposed. The scheme uses log monitoring technology for data acquisition, transfers data through the SNMP protocol, and integrates with ROSS monitoring platform based on Icinga extension mechanism, in the end it completes the overall monitoring for the task operation of the big data platform.(4) In view of the demand of the dynamic expansion of cluster resources, a scalable monitoring framework is designed. The framework is based on the Icinga plugin expansion strategy, combined with the custom script monitoring method, it can extend monitoring index by means of configuring the mapping relationship between the monitoring index and the monitoring plugin to meet the needs of monitoring the different resources.
Keywords/Search Tags:Cluster monitoring, Spark, Icinga, Performance monitoring, Log collection, Task monitoring
PDF Full Text Request
Related items