With the continuous development of information technology,the importance of software and hardware products such as cloud servers and electronic devices to people’s lives has gradually deepened.System versions and hardware equipment updates are increasingly adapted to people’s lives and usage habits,constantly enhancing people’s experience and sense of wellbeing.During the execution of systems,technicians use system logs to record detailed information during operation,which is important for analyzing the operating status of systems,detecting system abnormalities quickly and accurately,ensuring system stability and reducing economic losses.However,as the information age develops and systems become more complex and diverse,the number of system logs gradually increases,which poses many challenges for technicians to analyze system anomaly logs.At present,many researchers have conducted a lot of work on system log-based anomaly detection,but there are still many problems,mainly in three areas:(1)There are many types of logs.Most of the original system logs use semi-structured logs,and there is no uniform specification of the text and format used by different systems,which makes it necessary for researchers to develop different anomaly detection methods for different system logs.(2)Logs are unstable.New log message types and formats may be generated due to system version updates,evolution of logs,etc.If the anomaly detection model or knowledge base is not updated in a timely manner,new anomaly logs may not be detected.Also,when logs are collected and logs are parsed,incorrect log templates are generated due to poor parsing accuracy,which can also introduce noise and lead to increased log instability.(3)Fewer system logs have labels.Most system logs do not have labels,and the deep learning-based log anomaly detection model needs to be trained using a large number of log labels in order to improve the accuracy of log feature extraction and the anomaly detection effect.To resolve the above main problems,this thesis proposes anomaly detection method without log parsing,and the main research work is reflected in the following two aspects:1.Since there are few labelled system logs,to reduce computational costs and reduce log instability,this thesis proposes an unsupervised anomaly detection method based on Word2Vec(Word to Vector)without log parsing.The method requires only simple pre-processing,retains log details without a log parsing step,and uses the content of the original log messages as model input to avoid the noise caused by log parsing.The method uses Word2Vec to compute log word vectors and a Term Frequency-Inverse Document Frequency algorithm to compute log sequence feature representations to generate weighted log sequence feature vectors to cope with the evolution of log statements.Finally,a computationally efficient unsupervised clustering approach is used to identify anomalies.We collected datasets from Blue Gene/L(BGL)for extensive experiments.The experimental results show that this method has higher detection accuracy compared to LogCluster and the method is more stable across different windows and feature dimensions.2.In order to further improve the accuracy of anomaly detection and study the impact of log instability on log detection,this thesis proposes an anomaly detection method based on Bidirectional Encoder Representations from Transformers(BERT)of Transformer without log parsing.The method uses the BERT model for vector computation,which can make use of the bidirectional contextual information of the current word instead of unidirectional,so that the computed log sequence feature vector contains contextual information and can more accurately characterize the original log.The log classification task is implemented according to the features of the BERT model for generating semantic vectors.The method does not perform regular log parsing operations on the original logs and uses the pre-processed log content as input to the model,reducing log instability due to parsing errors etc.Compared with LogBERT and LogAnomaly methods,experimental results show that this method has higher accuracy in log anomaly detection.We also compared the method with parsed logs for anomaly detection experiments.The results show that the log anomaly detection method without log parsing has higher detection accuracy and effectiveness,and also validate the applicability of the method in anomaly detection. |