Font Size: a A A

Design And Implementation Of Fault Log Analysis System For High-Performance Fault-Tolerant Computer

Posted on:2012-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:C H WeiFull Text:PDF
GTID:2218330362451671Subject:Computer technology
Abstract/Summary:PDF Full Text Request
High-performance fault-tolerant computers are widely used in finance, telecommunications, energy, transportation, aviation and other countries?s key business areas. There are high demands on management capacity and availablity, because the system?s delays and failures may cause immeasurable loss in these key industries. Therefore, its processing capacity and availability must be evaluated, and fault injection technique is an effective method. Logs record the state of the system running, the effectiveness of fault injection and fault tolerance mechanism can be verified by analyzing logs, and high-performance fault-tolerant computer fault library can be build by extracting the fault log, then can provide abundant data to support analysis of failure distribution, fault propagation and fault prediction.This paper firstly gives a survey on existing log analysis tools and the research status of high-perfomance computer failures both at home and abroad, and find: 1) there is no log management and analysis tools for high-performance fault-tolerant computer, and because the statistical analysis is carried out locally by Existing log analysis tools, the server analysed provides lower efficiency to other users; 2) we are lack of experience in the research of high-performance computer fault, and the avaliable fault library is limited. Therefore, on the basis of depth study of data mining technology, this paper design and implementation a automated and intelligent fault log analysis system for high-performance fault-tolerant computer, in the propose of extacting and formatting fault from various types of log files of high-performance fault-tolerant computer systems,.establishing initial fault library of high-performance fault-tolerant computer, and greatly enhances the efficiency of the system analysed that provided to other users. Based on this, introduce the extreme value theory and data fitting method and propose a fault model establishment scheme based on fault log analysis system.To achieve statistical analysis for many high-performance fault-tolerant compute at the same time, this paper brings software testing automation framework (STAF) into the design of the log analysis system, successfully bulit a distributed experimental environment for two high-performance fault-tolerant computers, that are HP RX6600 and Superdome. And the effectiveness of the log analysis system has been verified by analyzing the experimental results of superdome detailedly. Finally, use the fault log analysis system for the LANL data set of the computer failure data repository (CFDR) built by CMU, and model for the TBF of special fault type base on the fault model establishment scheme mentioned above.
Keywords/Search Tags:High-performance Fault-tolerant Computer, Log Analysis, Data Mining, Extrem Value Theory
PDF Full Text Request
Related items