Font Size: a A A

Research On System Log Anomaly Detection Method Based On Semi-supervised Learning

Posted on:2024-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZhangFull Text:PDF
GTID:2568306941984589Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In the rapid development process of the internet industry,various application software and operating systems continuously appear in people’s daily lives.System log files record the operating status and detailed information of software and systems,which is an important data source for operation and maintenance personnel to inspect and process system anomalies.In order to improve system security and reliability,domestic and foreign researchers in the field of log anomaly detection have designed various anomaly detection methods for system log files.Traditional log anomaly detection methods use keyword search or manual rule matching to locate abnormal logs,but they have problems such as low detection efficiency and difficult rule maintenance,and they are no longer able to meet the requirements of the current environment.Subsequently,machine learning began to be applied to the field of log anomaly detection,such as decision tree,support vector machine and principal component analysis have been used to extract more log-related features,which not only have better performance,but also reduce the complexity of anomaly detection.However,they still face the challenge of analyzing and extracting hidden relationships in the features.Deep learning methods can effectively overcome this limitation,and the latest research uses LSTM,CNN and GRU models to extract log features and has achieved high performance.However,these methods still have some problems:in the semantic extraction process,existing log semantic representation methods only focus on the log template words themselves and do not consider word order,making it difficult to fully understand the semantic information of the log;in the anomaly detection process,existing log anomaly detection methods are limited by the number of labeled logs.When using a small number of labeled logs for training,the model’s performance is poor.This article explores these issues and proposes two solutions.(1)This paper proposes a semi-supervised log anomaly detection method called LogST based on Sentence-BERT,addressing the problem that current log semantic representation methods only consider the semantics of individual words and ignore word order.The SBERT model is used to extract the semantic information of the entire log template,considering the semantics and word order relationship of each word in the log template.The siamese network structure of LogST can effectively measure the similarity between log templates.Through a series of experiments,LogST outperforms existing unsupervised and semi-supervised anomaly detection methods,but there is still a certain gap compared to the supervised anomaly detection method LogRobust.However,LogST can save some manual labeling costs and has higher usability in practical applications.(2)This paper proposes a semi-supervised log anomaly detection method called LogALST based on active learning and self-training to address the issue of poor performance of anomaly detection models when the number of labeled logs is small.LogALST combines the strengths of active learning and self-training:self-training predicts a large amount of unlabeled log data to generate highly confident log data,which reduces the manual labeling cost of active learning.Active learning uses a sample sampling strategy to select highly uncertain log samples,which reduces labeling errors in the self-training prediction phase of unlabeled logs.Through a series of experiments,LogALST outperforms existing semi-supervised log anomaly detection methods and also performs better than using active learning or self-training alone for log anomaly detection,achieving better anomaly detection performance with less manual labeling cost.
Keywords/Search Tags:Log Anomaly Detection, Semi-Supervised Learning, Semantic Extraction, Active Learning, Self-Training
PDF Full Text Request
Related items