Font Size: a A A

Research On Log Anomaly Detection And Diagnosis

Posted on:2021-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:R P YangFull Text:PDF
GTID:1368330623482170Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The scale of network applications and systems continues to increase,and the network environment is increasingly complex.Several higher requirements are imposed on anomaly detection.Common traffic-based,malware-based and social network security event-based anomaly detection has its own limitations.Therefore,selecting an appropriate analytical data source for network and system security anomaly detection is a very important issue.It is very important to analyze the network behavior and system state by using the log data with rich information volume,detect the anomalies and recover unnecessary losses in time.Log anomaly detection technology has better complement and adaptability.At the same time,due to its heterogeneity,multi-source,large data volume,etc.,it is a hotspot and difficult problem in the research field of anomaly detection.Aimed at improving existing methods of log anomaly detection and diagnosis,and the detection effect based on the log sequence,this dissertation breaks through the conventional research of the log as a general text object.By in-depth studing of machine learning and deep learning methods suitable for this task,a relatively complete theoretical system for studying logs as sequence objects is formed.,The main problems and corresponding solutions in the modules of the log anomaly detection and diagnosis process are proposed,and a more systematic research result is formed.The work of this dissertation extends the research object in the field of anomaly detection with the log sequence,enriches the application field in the field of log mining,and improves the method architecture of log sequence anomaly detection.The research has important scientifically significance for the development of a general log sequence anomaly detection framework,and has important practical value for network and system abnormal diagnosis.The dissertation adopts the research ideas of “researching related work,discovering problems,analyzing problems,establishing models,experimental verification,and solving problems”,and mainly does the following work:First,the researches on the extraction of existing log templates relied too much on the log format.They did not study the template extraction of multi-source logs,and most of them were offline extraction.The offline algorithms consumed high memory and cannot meet real-time needs.In view of the shortcomings of the existing methods,an online log templates extraction method(Online Hierarchical Clustering for Log Templates Extraction,LogOHC)is proposed.This method firstly preprocesses the raw log messages,and uses the word distributed representation(word2vec)to vectorize the log messages online,adds leaves incrementally to the clustering tree so as to form a new clustering tree,and then optimizes the new cluster tree,finally generates the templates.The experimental analysis shows that LogOHC has a higher F1-score than the existing log templates extraction methods,is suitable for multi-source log datasets,and single-step execution time is short,which can meet the requirements of online real-time processing.Secondly,the traditional method for log anomaly detection did not consider the temporal relationship of the log messages and only concerned for a specific type of log.In addition,there were a small number of existing studies on log sequences that have insufficient performance capabilities and insufficient detection accuracy for shorter log sequences.This dissertation proposes a general log sequence anomaly detection framework based on attention mechanism.By modeling the log template sequence as a natural language sequence and the neural network training based word embedding as the input of the model,the framework can well represent the semantic rules of the target word in the current log sequence,and achieve the purpose of dimensionality reduction.It can speed up the operation efficiency of the whole framework.The stacking Long Short-Term Memory(LSTM)layer effectively extracts the implicit mode of the log sequence.The attention layer expresses its dependence by calculating the similarity between the logs in a sequence,and can better learn the internals of a sequence.The structure is finally trained by inputting the normal log sequence pattern into the model to detect unknown anomalies.The experimental results show that the overall accuracy of the detection framework for log sequence anomaly detection is better than the existing method and has lower time overhead.Thirdly,there are long-term dependencies in the log sequences generated by some new types of attacks.The existing log sequences anomaly detection model based on recurrent neural networks has good detection capabilities for shorter sequences,but the detection capabilities for long sequences is insufficient.A general log sequence anomaly detection model based on time convolution network is proposed.The model replaces the ReLU with parameterized ReLU and replaces the fully connected layer with an adaptive average pooling layer.The improved structure of the temporal convolution network well solves the problem that the activation function in the original temporal convolution network is easy to cause neurons to "kill" and cannot learn more effective features in the log sequence,and the number of parameters in the fully connected layer is more likely to cause over-fitting problem.The experimental results show that although the convergence rate is slowed,the accuracy rate is improved.The overall accuracy of the detection model is better than existing methods.Fourthly,To solve the log anomaly detection problem with a supervised method,which requires many persons with domain knowledge to label a large amount of data.In addition,the existing unsupervised log sequence anomaly detection researches were not accurate enough,and the anomaly detection strategy only considered a single factor.Aiming at this problem,a log sequence anomaly detection method based on Generative Adversarial Network(GAN)is proposed.This basic assumption is: “Anomalies often do not appear often,but normal data is easy to obtain”.The ability to generate samples that approximate the distribution of training data,by building a generator and discriminator based on the LSTM model,enables the model to model sequence tasks well.The normal log sequences are input to the model for training,and the anti-training process makes the model have a strong ability to generate normal log sequences.In the test phase,the linear weighting of the reconstruction error of the generator and the discriminant error are used as the criterion for determining the abnormality of the log sequences.The experimental results show that the detection model has better effect than the methods based on PCA and AutoEncoder,and the model has good stability and is not sensitive to the sequence length,so it can cope with different anomaly detection tasks well.Fifthly,although the anomaly detection model can detect the abnormal position,but because the event cannot be seen,the diagnosis effect is not ideal.In addition,the existing abnormal diagnosis can not construct the workflow graph well for the interleaved logs generated under multiple workflows.A control flow graph mining method with transition probability is proposed.By constructing a control flow graph on the log templates,the original log messages are converted into a structured event.The method takes the log templates as input,and the generated nodes in the control flow graph represent different log templates,and the directed edges represent the execution sequence of the log templates.They reflect the dependencies between the log templates.The timestamp information of the log messages are an important reference for dividing events.It can be seen from the experimental analysis that the proposed control flow graph mining method is helpful to diagnose the cause of the anomaly.
Keywords/Search Tags:Log Anomaly Detection, Log Anomaly Diagnosis, Log Template Extraction, Recurrent Neural Network, Self-Attention Mechanism, Temporal Convolutional Network, Generative Adversarial Networks, Control Flow Graph
PDF Full Text Request
Related items