Font Size: a A A

An Anomaly Detection Study Based On Multi-model Ensemble

Posted on:2024-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2568306923474344Subject:Statistics
Abstract/Summary:PDF Full Text Request
The purpose of anomaly detection is to identify data that significantly deviates from the normal pattern or behaves differently from the majority of samples in a large dataset.Anomalies are usually associated with issues such as security threats,financial fraud,medical malfunctions,and system failures,which need to be detected early to avoid economic losses.Although there are many detection methods available,current algorithms generally suffer from low generalization performance due to the wide variety of anomaly types,diverse scenarios of occurrence,and data imbalance in anomaly detection.Therefore,this article aims to design an algorithm with higher generalization performance to increase its applicability.Ensemble algorithms are a commonly used method in data mining,aimed at reducing model dependency on specific datasets or local data,thereby significantly improving the robustness of the data mining process.Due to the superior performance of ensemble algorithms,they have been widely used in fields such as clustering and classification.However,in anomaly detection,research on ensemble algorithms is relatively scarce.Therefore,this study will use the ensemble approach to conduct research from three aspects:theoretical proof,algorithm design,and empirical analysis.(1)In terms of theoretical proof,we first summarize the previous theoretical analyses of the advantages of ensemble models,including decision boundary theory,ambiguity decomposition theory of generalization error,and bias-variance-covariance decomposition theory,and give a detailed derivation of them.On the one hand,we apply the decision boundary theory to the weighted average model fusion scenario based on four different situations according to whether the base model is biased and whether it has correlation,and conclude that the additional error of the ensemble model is smaller than that of a single base model.On the other hand,we apply the bias-variancc-covaxiancc decomposition theory to the weighted average case,explain the impact of bias,vaxiance,and covariance on generalization error,and provide a formula for the relationship between the generalization error of the ensemble model and the average error of the base model.(2)In terms of algorithm design,this article proposes an anomaly detection ensemble algorithm based on focal loss to address the imbalance of anomaly detection datasets.The algorithm replaces traditional data sampling methods with focal loss as the loss function,and combines ensemble ideas to construct an optimized model.Sequential least squares programming algorithm is used to solve the model.Compared with traditional algorithms,this algorithm can better improve the weights of samples with fewer quantities and difficult separations,solve the problem of data imbalance,and reduce the risk of overfitting.(3)In terms of empirical analysis,this article applies the proposed anomaly detection ensemble algorithm to multiple datasets in different fields,including network abnormal order detection,credit card fraud detection,network intrusion detection,spam email detection,and cancer detection.The experimental results show that the ensemble algorithm proposed in this article exhibits good performance and generalization ability on anomaly detection problems in different fields.
Keywords/Search Tags:Anomaly Detection, Ensemble Learning, Decision Boundary Theory, Bias-Variance-Covariance Decomposition Theory of Generalization Error, Data Imbalance, Focal Loss, Optimization Model, SLSQP
PDF Full Text Request
Related items