Font Size: a A A

Research On Container Cascade Fault Detection Technology Based On Fault Correlation Analysis

Posted on:2021-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q W ZhongFull Text:PDF
GTID:2568306461452704Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The lightweight and a large number of containers in a container cloud platform,and the mutual dependence and association between containers make cascading faults between containers occur frequently,and at the same time,the rarity of faults causes an unbalanced ratio of cascading fault data and non-fault data in the platform’s historical data,making the accuracy and recall rate of cascading fault detection methods based on historical data not high.However,a single fault cascade,if not detected and processed in time,can affect most services in the cloud platform,resulting in huge losses.to accurately detect container cascade faults in cloud platforms and reduce the fault infection rate of the platform after the fault occurs,this thesis studies the optimization technology for container cascade fault detection.First of all,the cascading fault model based on traditional correlation analysis does not well consider the spatial dimensional information such as application,service,node,and fault domain that containers are distributed in the container-based cloud platform,as well as the time dimensional information of the cascading fault spread over time in the spatial dimension,which makes the fault propagation probability of containers in different spatial domains unable to get good calculation results.Therefore,this thesis proposes a time-dependent container cascade fault model construction method and implements container cascade fault detection based on this model.Through the container fault cascade history,mining frequently associated fault container instances,depicting the logical relationship between each isolated container fault,and analyzing the container fault propagation path.Through the cascade history,analyze the time dimensional relationship of fault propagation between containers,combine the spatial state data of containers in the container cloud platform to analyze the fault propagation spatial dimensional relationship and quantitatively calculate the time correlation and spatial correlation of container fault propagation.Based on the combination of qualitative and quantitative methods to depict the fault propagation path of container cascade,it better characterizes the fault propagation probability problems ignored by traditional correlation analysis model methods.Compared with the cascade fault detection method based on traditional correlation analysis,the proposed cascade fault detection model improves the detection accuracy and recall rate by about 10%.Secondly,since a single cascade fault detection model cannot well solve the problems of model generalization and detecting unknown cascade paths,a machine learning approach is usually applied to train the cascade model to improve the accuracy and generalization capability.However,in a real container cloud environment,the rarity of cascading faults makes the historical data generated by the cloud environment unbalanced;traditional learning methods are mostly based on the assumption of data balance,ignoring the unbalanced ratio of cascading fault data and normal data in the historical data in the cloud environment,resulting in the accuracy and recall of the obtained detection model is low.To address the above problems,a cascade fault model integration learning method for unbalanced historical data is proposed,which improves the sampling probability of detecting error samples in the model detection process through dynamic feedback sampling of historical data samples,so that the fault samples with detection errors can obtain more model learning in the next round of training,and effectively solves the unbalance of data samples through integrated learning of data partitioning and multi-model parallel learning.problem and generate a final container cascade fault detection model through iterative enhancement to further improve the detection recall rate.Finally,a prototype Cascade-Warn cascade fault detection system for container cloud clusters is designed and experimentally validated in a Dockerbased container cluster.The results show that the method in this thesis has higher detection accuracy and recall,lower model error and false positive rate compared to existing cascade fault detection strategies.
Keywords/Search Tags:Cloud container, Cascading faults, Correlation model, Ensemble Learning, Fault detection
PDF Full Text Request
Related items