Font Size: a A A

Research On Multivariate Anomaly Detection And Fault Localization For Hybrid Cloud System

Posted on:2022-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:J J WuFull Text:PDF
GTID:2518306479993909Subject:Software Engineering
Abstract/Summary:PDF Full Text Request
With the development of cloud computing,information business systems and applications on the cloud have become the mainstream.At present,many large-scale systems are running in a complex hybrid cloud environment,which combines public cloud with private cloud or local infrastructure.In such a complex cloud environment,it is difficult to carry out real-time monitoring and data collection of the system.It is also difficult to detect and locate anomalies through the collected data.According to the characteristics of hybrid cloud system,this paper designs a general real-time monitoring and data collection framework,and proposes a multivariate anomaly detection and univariate fault localization algorithm.By deploying the above framework in a specific hybrid cloud system,the universality and effectiveness of the real-time monitoring and data collection framework are proved.Experimental data on multiple hybrid cloud dearly show the precision,generality and effectiveness of the algorithm.Major contributions of this paper include:1.Proposed a general real-time monitoring and data processing framework.Hybrid cloud is complex and dynamic changing.The general monitoring methods can not satisfy its needs.Therefore,combined with the characteristics of hybrid cloud environment and system,this paper designs a general hybrid cloud real-time monitoring and data collection framework to solve the problem of real-time monitoring and data collection in hybrid cloud environment.The framework consists of real-time monitoring module and data collection module.The main functions of the real-time monitoring module include operation and maintenance data monitoring,data visualization and simply alert.The monitor of operation and maintenance data consists of public cloud monitoring,private cloud monitoring and local infrastructure monitoring.The main functions of data collection module include regular data collection,data standardization and permanent data storage.In this paper,the general framework is deployed in a typical hybrid cloud system which named"Kfcoding".It collects all the operation and maintenance data which including host,container service and database.Then it displays these data through visualization function.In addition,these data are also applied to the research of anomaly detection and fault localization.2.Proposed a multivariate anomaly detection and univariate fault localization algorithm.Firstly,this paper defines the scene of multivariate anomaly detection and fault localization.The fault localization is simplified to a univariate anomaly detection problem of sub time series.In the algorithm design,the multivariate anomaly detection module combines convolutional neural network and convolutional long short memory network(Conv-LSTM).In this paper,we also consider the changes of anomalies in time dimension and metric dimension.By using Conv-LSTM to obtain the characteristics of time dimension and metric dimension,the problem of multivariate anomaly detection in hybrid cloud system is solved.The univariate fault localization module combines weighted moving average(WMA),exponential moving average(EMA)and STL decomposition.It can precisely and quickly locate the metrics that trigger the anomaly,and then solve the problem of fault localization in hybrid cloud system.In the experiment,this papaer uses hybrid cloud datasets to compare the precision of anomaly detection,fault localization speed and precision.Through a large number of experiments,this papaer shows the effectiveness of the algorithm in anomaly detection and fault localization of hybrid cloud system.3.Improved fault localization algorithm based on TCN and attention mechanism.Aiming at the fault localization algorithm proposed in the second work,this paper improves it by combining time convolution network(TCN)and global attention mechanism,which greatly improves the precision of fault localization.The improved algorithm combines dilated convolution network and residual connection module.And then an attention mechanism layer is added on each convolution layer.It can better obtain the characteristics of time series in a larger receptive field.And It can also solve the problem that the fault localization algorithm is fast but not precise enough.In this paper,a large number of experiments on hybrid cloud datasets show that the improved algorithm is feasible and effective.
Keywords/Search Tags:anomaly detection, fault localization, attention mechanism, multivariate, hybrid cloud
PDF Full Text Request
Related items