Font Size: a A A

Anomaly Detection Algorithm For Spatiotemporal Data In Cloud Environment

Posted on:2022-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:L ShiFull Text:PDF
GTID:2518306782952599Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,the continuous increase in cloud applications has become critical,which means that it is necessary to accurately and effectively detect the cloud server data.However,facing the complex data pattern of cloud servers,simple artificial observation is not enough to effectively detect the cloud server,but requires the abnormality in the data by an algorithm,providing a basis for subsequent fault location,and equipment overhaul.Despite years of development,the current anomaly detection methods still have the following shortcomings:(1)The usage scenarios are limited.It relies heavily on the expert knowledge base and human assistance in a single field,and the corresponding rules need to be continuously updated when an abnormality occurs;the data to be detected is required to obey a certain distribution,which means that it is difficult to find a suitable model for analysis when the data distribution is uncertain;(2)High label dependence and unbalanced data samples.If the supervised method needs to improve its accuracy,it requires the collection of a large number of abnormal samples to solve the problem of unbalanced positive and negative samples;the unsupervised method has low computational efficiency and is not suitable for online detection,and needs to continuously adjust parameters and thresholds.When facing higher data dimension,the outliers cannot be accurately judged;(3)The feature utilization is insufficient,and the algorithm tends to ignore too many data features(such as temporal features and spatial features)when processing multi-dimensional data.When the importance of features cannot be judged,it will be lost the characteristic information of the data and affect the final anomaly detection effect.Additionally,the correlation between the data dimensions may be ignored;(4)The definition standard of anomaly threshold is fuzzy.At present,the definition of thresholds relies much on artificial definitions.Datasets in different fields have certain differences in the conceptual definition of anomaly,and the results of anomaly detection heavily depend on the selection of thresholds.In view of the above shortcomings,this thesis proposes an unsupervised data reconstruction anomaly detection with spatiotemporal features.The specific work is as follows:Faced with data with temporal and spatial features,firstly,the spatial features and attributes of the data are described by establishing a graph model,the cloud server is regarded as a node,and the connection relationship is regarded as an edge;then the graph convolutional neural network model is used to extract its spatial features.Then,the spatial features at different times are formed into a time series and input into the long-short term memory network model to extract temporal features;the trained GCN-LSTM model is used to reconstruct the input data,and the reconstructed data is compared with the original input data to define reconstruction error;finally,the Copula-Based Outlier Detection(COPOD)is used to calculate the empirical cumulative distribution of the reconstruction error and the tail probability under the corresponding time snapshot for anomaly detection.In the experimental part,we use the MBD dataset from a big data batch processing system and the MMS dataset from the microservice-based transaction processing system for experimental evaluation.The results show that the method can make full use of the features in the data and complete the anomaly detection task without supervision and without manual adjustment of the threshold,which proves the effectiveness and accuracy of the method.However,when the GCN-LSTM model reconstructs data,most of the reconstructed data can be considered normal,but data with anomalies will also be reconstructed and generated by the model,so there will be uncertain reconstruction errors when generating,which means errors are cancelled out,thereby reducing the effectiveness of the model.In order to further improve the performance of the model,this thesis introduces the EncoderDecoder framework,and improves the reconstruction model proposed above into a Seq2 Seq model.The information in the input sequence data is encoded by GCN-LSTM,and then decoded by another GCN-LSTM model to form a new GCN-LSTM Encoder-Decoder architecture,thereby obtaining an optimized reconstructed sequence.At the same time,in order to solve the problem of model performance degradation caused by long sequences,this thesis also introduces an attention mechanism to select the most noteworthy information in the sequence for global consideration,so that the reconstruction model has achieved more progress and improvement.Finally,this thesis uses the same dataset and COPOD anomaly detection method as before improvement for experimental evaluation,and each performance indicator has an improvement of about 5%-25%,which proves that the anomaly detection method based on GCN-LSTM Encoder-Decoder and attention mechanism can better extract data features and reconstruct them without relying on data labels,and can effectively detect anomalies.
Keywords/Search Tags:Anomaly detection, Graph Convolutional Network, Long Short-Term Memory, Data reconstruction, COPOD
PDF Full Text Request
Related items