| With the diversification of usage scenarios and requirements and the rapid development of microservice architectures,the scale and complexity of web systems have increased dramatically.How to better ensure the stable operation of services or systems has become the focus of Web system operation and maintenance personnel.It is of great significance to maintain the stable operation of the system to detect abnormalities in time by monitoring the key indicators describing the operating status of the system.At present,the commonly used index monitoring methods include:setting a fixed threshold,and determining the abnormality when exceeding or falling below the threshold;detecting the value mutation in the index sequence.These methods suffer from the following problem:the kinds of anomalies that can be detected are limited.Only abnormality when the value is too high or too low or abnormal sudden change of value can be detected according to the preset threshold;the system operating state is different in different periods,such as more frequent use during peak hours.As the operating status of the system changes,the threshold should also be changed,depending on the experience of manpower and operation and maintenance personnel;the granularity of indicator-level exception alarms is too fine,and it is difficult for the operation and maintenance personnel to filter the alarm information when an exception occurs,and the ability to monitor exceptions at the entity level is lacking.In view of the above problems,this paper adopts intelligent operation and maintenance algorithm to replace the traditional operation and maintenance method,and studies the anomaly detection algorithms of single index and multi-index respectively,and realizes an anomaly detection system based on the algorithm research results.The main work is as follows:(1)A time series anomaly detection model based on deep mining of spatiotemporal features is designed.The model uses an encoder based on VGG+Bi-LSTM to mine spatiotemporal features in time series data,and uses a fully connected neural network and a decoder composed of BiLSTM to reconstruct the input data.Compared with the optimal baseline model,the recall rate of this model is increased by 6%,and the F1-score is increased by 0.04.(2)An entity-level anomaly detection model is designed.The model uses all the indicators related to a system/service as input,and can give alarm information at the system/service level.The model uses a Bayesian deep learning-based approach for multi-index anomaly detection.In order to avoid the underfitting problem caused when the data distribution in the latent space does not conform to the Gaussian distribution,the model uses the method of Autoregressive Flows to learn the non-Gaussian posterior distribution in the latent random space through a series of reversible mappings.A simple convolutional neural network is simultaneously used to extract the dependencies of each variable in the multivariate time series input,and both networks are trained simultaneously to reconstruct the input data.Compared with the optimal baseline model,the accuracy of this model is increased by 4%,and the recall rate is increased by 5%.F1 score improved by 0.04.(3)Based on the trained model as anomaly detection module,an anomaly detection system with complete functions and strong robustness is designed and implemented,and various functions of the system are tested. |