Font Size: a A A

The Design And Implementation Of Anomaly Detection And Fault Repair System For Container Runtime Environment

Posted on:2022-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:L ChiFull Text:PDF
GTID:2518306551954219Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Containers are becoming a powerful choice for the next generation of applications based on microservices,and at the same time occupy an indispensable position in actual cloud-native production environments.In actual application scenarios,with the continuous expansion of the container cluster scale,the continuous replacement of the infrastructure environment,and the continuous changes in service resource requirements,the container runtime cannot provide a continuous and stable operating environment for the application.When an abnormality occurs in the runtime environment of the container,it is difficult to quickly locate the abnormality and allow the abnormality to converge quickly.In this context,this paper designs and implements a container runtime environment anomaly detection and fault repair system.The container runtime environment anomaly detection and fault repair system mainly includes four modules,namely model training,anomaly detection,root cause analysis,and fault repair.The model training module mainly provides the training process of high-performance anomaly classification models,and is responsible for labeling time series data based on unsupervised learning algorithms and constructing supervised anomaly classification models;the anomaly detection module mainly provides the processing flow of anomaly analysis,and is responsible for analyzing realtime data based on a variety of anomaly classification models,and generating the current cluster abnormal distribution situation;the root cause analysis module mainly provides the ability to analyze the root cause of abnormal events,and is responsible for tracing the root cause of abnormal events according to the current cluster abnormal distribution and locating the fault container;the fault repair module mainly provides a fault repair model,which is responsible for selecting the optimal repair action for abnormal events based on the fault repair model and quickly restoring the container environment to a healthy state.In addition,the system introduces an analysis strategy of abnormal markers in the root cause analysis to effectively construct the abnormal link relationship of different levels,and uses the Viterbi algorithm to retrospectively analyze the abnormal propagation links at all levels,which effectively realizes the tracing of the root cause of the failure;in the fault repair,a type of fault repair matrix model is introduced,which effectively simulates and expresses the fault repair process,so that the system can formally store the repair process of all faults.
Keywords/Search Tags:Container, Anomaly Detection, Fault Repair
PDF Full Text Request
Related items