Font Size: a A A

System Log Anomaly Detection System Based On Hierarchical Clustering And CNN-text

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZengFull Text:PDF
GTID:2518306245982089Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet industry,large-scale distributed systems are widely used in various service fields and gradually become the core components of the IT industry.The increasing volume and complexity of the system brings many challenges to system maintenance.How to detect and prevent possible system failures in time has become the focus of research.System logs exist in all kinds of systems,and are widely used to record hardware,software and system problems in the system.When the system is invaded,it will also leave traces in the log,which plays an important role in system maintenance.However,the traditional keyword-based retrieval and rule-based regular expression retrieval are no longer applicable in the case of complex log data and large amount of data.Monitor the system by monitoring the system log,timely detect anomalies and alarm to bring new ideas for the operation and maintenance of the system.Machine learning and deep learning have been hot topics in recent years,which have been widely used in the field of text,image and speech recognition.Machine learning and deep learning are used to learn a large number of data sets.Finally,the recognition and prediction of specific situations are completed.However,machine learning methods pay more attention to feature engineering,and whether effective features can be extracted becomes a key factor in the accuracy of the model,while deep learning can automatically extract features through autonomous learning of the original data set.effectively reduce the impact of feature engineering,so the application of deep learning model in the field of system log anomaly detection is expected to break the limitations and promote the development of system anomaly detection.This paper mainly pays attention to the characteristics of distributed system logs,applies hierarchical clustering and convolution neural network to the field of text processing,and designs and implements a system log anomaly detection system based on hierarchical clustering and CNN-text.In the system,heuristic rules are mainly used to analyze the logs,and then the hierarchical clustering method is used to cluster the original logs and manually label the data at the center of each class,the purpose of clustering is to reduce the workload of manual labeling.In the model construction stage,we first use the word embedding method to vectorize the text data,and finally put into the convolution neural network for training,in order to train a model that can efficiently and accurately detect anomalies through the log.The system log anomaly detection system based on volume hierarchical clustering and CNN-text mainly includes the following modules: data acquisition and preprocessing module,log clustering and visualization module,model training module,anomaly detection module.In this paper,hierarchical clustering and convolution neural network are combined and applied to the field of system anomaly detection.The final experimental results show that the model used in this paper has high accuracy and low false alarm rate,and has advantages in dealing with unbalanced data.It is proved that the convolution neural network has good performance in time and performance.
Keywords/Search Tags:anomaly detection, log analysis, hierarchical clustering, word embedding, convolution neural network
PDF Full Text Request
Related items