Font Size: a A A

Research And Design Of Fault Diagnosis Method For High-Performance Sever

Posted on:2012-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:J LiangFull Text:PDF
GTID:2218330362450473Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-performance server has been widely used in banking, military, aerospace, meteorological services and other fields. In these areas, high-performance servers are used to handle critical business, system data loss or abnormal shutdown will lead serious consequences. Therefore, the availability of high-performance server becomes increasingly important. High availability requires an efficient fault detection, fault diagnosis, and fault recovery techniques.High-performance servers are generally achieved in the form of the cluster, because the cluster is better than other forms of system in terms of price, scalability, etc. This paper takes the research about a fast and efficient fault diagnosis system for high performance cluster server which is based on operating system. To make the system providing uninterrupted service, study fault monitoring methods, and based on the fault monitoring method, study fault diagnosis methods. The Goal is to get as large fault detection coverage as possible with as small overhead as possible.Achieve service level and node-level granularity of the diagnosis. From the diagnostic process, it is divided into fault monitoring and fault diagnosis, and from the implementation granularity, it is divided into self-fault diagnosis and system-level fault diagnosis. It monitors the hardware status information about the node, such as CPU, memory, network equipment, power, etc.; it monitors the node operating system processes information, including the core system service processes, user configuration processes and so on. Designed the diagnosis rules for different monitoring information to get high availability, and achieved a rapid self-faults diagnosis method. Designed heartbeat module and system level fault diagnosis. Through the design of the heartbeat mechanism, one node can detect the failure of other nodes. With the diagnostic system, it can access accurately and timely diagnosis results, isolate the failed node out of the system, to improve availability.
Keywords/Search Tags:cluster, monitoring, fault diagnosis, heartbeat detect
PDF Full Text Request
Related items