A log is a record of data or events generated while a computer system is running.The importance of the log system in modern computer systems,particularly in largescale distributed systems,is growing as computer system scale,system complexity,and user service quality requirements increase.Appropriate log information not only records the system’s running state but also provides basic data for system performance diagnosis,fault analysis,auditing,and so on.At present,large cloud service providers have deployed large-scale log service systems to provide high-performance unified log services for multiple systems,which not only simplifies the distributed system architecture but also improves the reliability of distributed systems.Abnormal data in log records often indicates that the system has performance fluctuations,performance anomalies,or failures.Therefore,log anomaly detection can detect or analyze the running state and health status of the system in real time.The existing log anomaly detection research mostly uses a single feature for anomaly detection,which has the problems of inefficiency,incompleteness,and a high misjudgment rate.In addition,the existing log anomaly detection research does not consider the problem of log parameter anomalies,and such anomalies cannot be identified by anomaly detection methods based on log sequences.The existing research on anomaly detection of log parameters still boils down to anomaly detection of sequences without considering the anomalies of log parameter values themselves.At the same time,the lack of abnormal labels in the log data set,or the log data set only identifying whether a row of logs or the block to which the log belongs is abnormal,and the lack of identification of whether the log parameters are abnormal,will cause the accuracy of log anomaly detection to decrease.Given the shortcomings of the preceding log anomaly detection methods,the following is the research work of this thesis:(1)A multi-featured log event anomaly detection method is proposed.To begin,in order to address the issue of single feature inefficiency,this thesis proposes multivariate log features such as sequence features,quantitative features,semantic features,and time features,as well as feature vectorization and feature fusion.Second,in order to solve the problem that traditional text representation cannot take into account the context information in the log,this thesis uses Bert combined with TF-IDF to obtain semantic feature vectors and obtains the model’s input features via feature fusion.Finally,a biLSTM anomaly detection model based on attention mechanisms is developed to process sequence data effectively and improve the model’s sensitivity to anomalies.Experiments show that the proposed method achieves an average precision,recall,and F1-Score of 95.9%,96.3%,and 96.0%,respectively,which is 3.55%,4.1%,and 4.75%higher than the feature-based log anomaly detection model.(2)An improved clustering log parameter anomaly detection method with log parameter characteristics is proposed.In this method,when initializing the clustering center,the strategy of eliminating the discrete points of log parameters is used to solve the problem that the traditional initialization method leads to inaccurate and inefficient clustering results.Secondly,according to the characteristics of log parameters with data type diversity,this thesis designs a K-Means log parameter anomaly detection algorithm with log parameter characteristics.It mainly clusters according to different log parameter attributes by division,that is,according to different attributes.The parameters select the clustering center rather than directly clustering all parameters to find the clustering center and discriminate discrete points.Finally,the log parameters are identified as normal or abnormal by calculating the anomaly Amon Score.Experiments show that compared with the traditional machine learning anomaly detection method,the proposed method has a 9.2% higher detection rate and a 4.6ering center,the strategy of eliminating the discrete points of log parameters is used to solve the problem that the traditional initialization method leads to inaccurate and inefficient clustering results.Secondly,according to the characteristics of log parameters with data type diversity,this thesis designs a K-Means log parameter anomaly detection algorithm with log parameter characteristics.It mainly clusters according to different log parameter attributes by division,that is,according to different attributes.The parameters select the clustering center rather than directly clustering all parameters to find the clustering center and discriminate discrete points.Finally,the log parameters are identified as normal or abnormal by calculating the anomaly Amon Score.Experiments show that compared with the traditional machine learning anomaly detection method,the proposed method has 9.2% higher detection rate and 4.6% lower false detection rate.It is of great significance to use logs to discover possible anomalies in the system.Therefore,on the one hand,this thesis proposes a log event anomaly detection method based on multiple features,defines the multiple features of logs,and establishes a log event anomaly detection model based on attention mechanisms.On the other hand,an improved clustering log parameter anomaly detection method with log parameter characteristics is proposed,which improves the initialization process of the K-Means algorithm and the selection process of the log parameter clustering center during clustering.In general,the two anomaly detection models proposed in this thesis have a certain improvement in accuracy,which is better than the existing log anomaly detection model,and can provide a certain reference for the auxiliary discovery of log anomalies. |