Anomaly Data Detection Based On Ensemble Learning And Its Application In Network Traffic Data

Posted on:2022-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:L Wang

Full Text:PDF

GTID:2507306350464134

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Anomaly detection is an important task in knowledge discovery and machine learning.With the rapid development of the Internet,the security,authenticity and validity of data have become a topic of concern to many people.However,in the face of large data sets,the emergence of abnormal data is inevitable.Therefore,the detection of outliers is of vital importance for solving network security and avoiding risks for enterprises and individuals.The main purpose of anomaly detection is to dig out abnormal data which obviously deviates from the normal pattern or is different from the performance of most samples from the given data,so as to identify the abnormal data and avoid the risk.In the network traffic data,anomaly detection can be used as the "sentinel" of network security,which can give early warning to some network security threats and illegal intrusion.It can also be used for abnormal user identification,traffic cheating detection,abnormal order detection,rare disease identification,fraud detection,loan risk identification and scalper identification,etc.Under the above background,this paper studies the anomaly detection based on the idea of ensemble learning,and proves the effectiveness of the method proposed in this paper through experimental results.At the same time,it proves the applicability of the method from two specific application scenarios.Because of the unbalance of anomaly data,most single models are easy to overfit in anomaly detection.In this paper,through the study of integrated methods such as Isolation Forest,Random Forest,Adaboost,XGBoost and LGBM,it is found that the integrated model can solve this problem well,and has higher detection accuracy than the single model.However,it is difficult to find a suitable integration model based on abnormal data structure in practical application.Therefore,based on the idea of integrated learning,this paper proposes two model fusion anomaly detection methods based on Stacking and Voting,which on the one hand improves the model accuracy and reduces the risk of overfitting,and on the other hand avoids the poor task learning performance caused by improper model selection.At the same time,KDD Cup 1999 data set was used to train the proposed fusion model and the commonly used anomaly detection model,and the evaluation indexes such as macro precision rate,macro recall rate,macro F1-score,Receiver Operating Characteristic curve and the Area Under ROC Curve value were used to evaluate the experimental results.By comparing the experimental results,it is found that the training result of the fusion model is better than that of the single model,and it is the best among all the experimental methods.Furthermore,the AUC value of the fusion algorithm based on Stacking model is higher than that of the fusion algorithm based on voting method,but the model training is relatively time-consuming.In addition,the performance of anomaly detection based on isolated forest is relatively better,and the isolated forest algorithm can be selected for anomaly detection of unlabeled data in practical application.Finally,based on two practical application scenarios of abnormal detection of advertising traffic and abnormal detection of order traffic in network traffic data,this paper further illustrates the applicability of the fusion model by modeling and analyzing the actual data.

Keywords/Search Tags:

Anomaly Detection, Integration Learning, Model Fusion, Isolation Forest, Traffic Data

PDF Full Text Request

Related items

1	Research On Unsupervised Anomaly Detection For High-dimensional Data Based On Autoencoder Ensembles
2	Anomaly Detection And Quality Assessment Of University Contract Data Based On Machine Learning
3	Anomaly Detection Based On Ensemble Learning
4	Research On Anomaly Detection And Visualization Technology In College Student Physical Fitness Test Management System
5	Research On Emotional Abnormal Detection Based On Weibo Review Data
6	Fall Detection System Design Based On Multi-sensor Feature Fusion And SVM
7	Research On Learning Effectiveness Prediction Based On Online Learner Data
8	Research On Anomaly Detection Method Of Examination Surveillance Video Based On Contrastive Learning And Pretext Tasks
9	Forecasting Loan Default Based On Random Forest Model Fusion
10	Research On Data Mining And Early Warning Mechanism Of The Influencing Factors Of Junior High School Students’ Mental Health