Font Size: a A A

Anomaly Detection And Diagnosis Method Of Software Based On The Log Analysis

Posted on:2022-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:B M WangFull Text:PDF
GTID:1488306497489864Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The scale and complexity of software system are increasing.By analyzing the log data which records the software system operation status,it can help developers to identify and detect anomalies quickly and accurately.Due to the characteristics of log data such as ununified format,large scale,and unbalanced category,there are some problems such as low efficiency of log data parsing,high dimension of log vector,accuracy and efficiency of log anomaly detection and diagnosis methods need to be improved.To solve the above problems,this thesis proposes an anomaly detection and diagnosis method based on log analysis,and the main research contents are as follows:(1)This thesis proposes the parssing method Gram-FPM and the low-dimensional feature extraction method.In Gram-FPM parsing method,the N-gram algorithm is used to divide similar logs into a group,and then the frequent pattern mining technology is used to quickly extract log event templates from similar log data to achieve efficient log parsing.In the feature extraction method,the minhash algorithm is used to transform the word frequency matrix,and the rules for extracting features are set.The dimension of the log vector based on the word frequency feature is reduced from the size of the Bag of Words model to the number of the transformed word frequency matrices,so as to achieve the acquisition of low-dimensional log vectors.(2)This thesis proposes a log-based anomaly detection method with clustering and MVP tree.This method uses an improved K-Nearest Neighbor(KNN)algorithm based on weighting technology to improve the accuracy of anomaly detection with the KNN.In the process of using the KNN algorithm,the clustering algorithm is used to construct the required labeled sample set for the KNN algorithm;then,the MVP tree structure is constructed for the labeled log sample set.Through the neighbors searching method from this ordered tree,the number of samples that need to be compared can be effectively reduced,and the efficiency of neighbor search can be improved,thereby realizing efficient anomaly detection.(3)This thesis proposes a multi-type anomaly diagnosis method based on spectral clustering and Neighbor Weighted ensemble rules.This method analyzes the unbalanced characteristics of log data,and uses the three steps of labeling,sampling and reorganization on the clustering results of log data,which takes the same number of samples from each type of data to reorganize,and a balanced log data sample set is obtained;Then,the ensemble learning method is used to complete the multi-type anomaly diagnosis with the balanced data set.When using the ensemble learning method,aiming at the limitation that the existing ensemble rules cannot accurately classify the samples at the category boundary,a set of Neighbor Weighted ensemble rules are proposed to improve the classification accuracy of the category boundary samples and high-accuracy of multi-type anomaly diagnosis is realized.
Keywords/Search Tags:log data parsing, anomaly detection and diagnosis, Spectral clustering, K-Nearest Neighbor, ensemble learning
PDF Full Text Request
Related items