Font Size: a A A

The Research And Implementation Of Cluster Monitoring System Based On Linux/UNIX

Posted on:2009-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WuFull Text:PDF
GTID:2178360242466698Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The fast development of PC, work station and high performance network system promote the development of high performance Supercomputer from mainframe to computer cluster. But for the reasons like loose structure, highly independent nodes, etc, it is hard to maintain the computer cluster. The popular solution to the problems is to build a cluster management system above the operating system of those node machines. The cluster monitoring system is an important composition of the cluster management system. It is mainly responsible for monitoring all performance index of the system and providing alarm when exception is thrown out.Although the existing cluster monitoring systems in the world have rich functions and good performance, they still have some drawbacks as follows: 1) Most of them use C/S framework, which makes the system unable to get the node information when software on that node throw exceptions. 2) The monitoring information is not fully utilized to predict exceptions of system. 3) Especially when exception happens with no one on site, some systems already use email or SMS to alert responsible person but it cannot make sure that whether the information is received correctly or not.On the basis of research on several existing typical cluster monitoring systems, this paper has designed a three-layer communication model based on C/M/S and the half asynchronous communication protocol, and completed overall design of the cluster monitoring systems which realize the ACMS( Automatic Cluster Monitoring System) based on Linux/Unix through implementing modules such as data collecting, streaming data mining and SMS receiving and sending.The main work of this paper includes: 1) Using telnet protocol, it has realized the scheme of achieving monitoring information through communication with nodes which has failed to connect with server machine. 2) By applying streaming data mining technology to cluster monitoring systems, it designs streaming data mining algorithm, for the purpose of predicting cluster system's possible exceptions and their probability as well as setting alert according to the predicting results. 3) It designs alert systems through short message and human machine interaction protocols for system administrator, which will resend the short message when the short message is traced to be lost which improves the robustness of the system.This paper finished the testing on major modules of ACMS—data collection unit, streaming data mining unit and short message sending and receiving unit, which testified their feasibility and functions. All modules in ACMS realize their scheduled functions, make up disadvantages of existing cluster monitoring system, increase the system reliability and contribute to its intelligentizationACMS has already been in operation in monitoring AFC(Automatic Fare Collection) server of Shanghai metro Line 3, Line 4 and Line 5, which is developed and maintained by Huahong smart card Limited Corporation in Shanghai. Because the streaming data mining module for alert requires large quantities of real data to repeatedly test and validate, it's still in testing period. Practice shows that, ACMS can be reliable in performing monitoring and alarm function of the Cluster Monitoring System.
Keywords/Search Tags:Cluster monitoring system, DCU, DMU, SMU, DAU, DSU
PDF Full Text Request
Related items