Font Size: a A A

A Multi-type Failure Event Prediction Method Based On System Log Clustering

Posted on:2019-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:W H WangFull Text:PDF
GTID:2428330545486955Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the size and complexity of the cluster system continue to increase,the occurrence of failure has become a normal rather than an accident,which has a serious impact on the performance and operating costs of the system.Accurate failure prediction can optimize a variety of fault-tolerant processes to mitigate the impact of failures on the system.The log records all kinds of things that happened in the system and is an important source for understanding the system behavior.It can be used for failure analysis and prediction to improve the reliability of the system.This dissertation studies the failure prediction based on system logs,and the main contributions are as follows:(1)This dissertation presents a method for predicting multiple types of failure events.The two-class prediction method can only predict whether a failure will occur in the future,but cannot provide relevant information of the failure.This dissertation first classifies and labels different log events based on log messages which describe the events that occurred in the system.So,when the failure events are predicted,the information about the failure is described.Secondly,for different failure events,the respective log event series set are clustered separately to mine the frequent event sequences associated with each type of failure event,which reflects the way in which multiple types of failure events are predicted.Finally,failure event prediction rules are generated based on the frequent event sequences related to failure event,and how to predict failure events with the prediction rules is introduced,so as to achieve a complete method for predicting multiple types of failure events.(2)According to the characteristics of the log event sequence,the hierarchical clustering algorithm is improved to effectively mine the respective frequent event sequences of each type of failure event.First of all,this dissertation uses the Longest Common Subsequence between log event sequences as a measure of sequence similarity.Secondly,the improvement of hierarchical clustering algorithm mainly includes the following two points:(a)In the merging step,instead of selecting the closest two clusters for merging,a similarity metric threshold is set,and any two clusters satisfying the threshold will be merged,and the threshold gradually decreases with the merging process.(b)After each round of merging,if the number of event sequences in the cluster exceeds a certain threshold,the cluster will no longer participate in the next merge and put the cluster into the final result set.The above two improvements can not only optimize the clustering of log event sequences so as to obtain better frequent event sequences,but also effectively reduce the time required for clustering.(3)Experiments were conducted using real system log data generated by two supercomputers,Thunderbird and Spirit,to verify the effectiveness of the proposed method.
Keywords/Search Tags:System Log, Failure Event Prediction, Hierarchical Clustering, Supercomputer
PDF Full Text Request
Related items