Font Size: a A A

The Key Research And Implementation Of Cluster Monitoring System For Large Scale Cluster

Posted on:2011-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:A T SunFull Text:PDF
GTID:2178330332956562Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
At present,the trend of super computer performance has become from the large high-performance computer to high computer cluster development,the cluster of this calculation techniques have been developing rapidly.With the wide application of cluster technology,the requirements of performance and usability also is rising,but for the reasons like loose structure,highly independent nodes,network connection complexity and fault beyond recovery.In order to solve the above problems,the cluster establish a cluster monitoring system based on the nodes operating system.It is very important infrastructure on a large cluster system,its basic task is to obtain the configuration of the cluster,monitor cluster's health condition and performance indicators and provide cluster system fault diagnosis function.Cluster monitoring system at home and abroad have good performance and usability,but there are still some deficiencies:(1)When cluster monitoring system is collecting the information,the user cannot acquire information in time,it will introduce the large system overhead,thus to cluster system load.(2)When a monitoring node has a fault,it will be unable to realize the automatic transfer,cause monitoring data transmission failure and reduce cluster monitoring system reliability.The main work of this paper includes:(1)Through having analyzed monitoring collection information technology , combining IEEE1394 protocol and Ganglia cluster monitor technology.A kind of cluster monitoring information-collection model is designed and implemented,this model not only reduces the cluster monitoring system load,effectively improves the monitoring information collection rate and cluster monitoring system availability.(2) Analyzing the cluster monitoring system in the treatment of the node fault monitoring, a kind of fault-tolerant cluster monitoring node model is put forward,this model avoids the cluster monitoring node fault,strengthens monitoring system availability and advances the reliability of nobody hold monitoring.This paper has main performance indexes Based on NGMON fleet monitoring system-cluster monitoring information collection load,communication efficiency and the system fault was tested,verifies the feasibility and practical function of them,NGMON recoveries cluster monitoring system deficiency , improves the cluster monitoring system performance, reliability, availability and manageability.This paper contents are Dalian science and technology fund project:This paper is an important part of new web server resources management and monitoring system (Numbers: 2005J22JH031).
Keywords/Search Tags:cluster, cluster monitoring system, IEEE1394, fault-tolerant
PDF Full Text Request
Related items