Research On Oversampling And Classification Methods Of Imbalanced Data In Intrusion Detection

Posted on:2023-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2568306836469664

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Network security has caused broad public concern.As time goes on,the threats of some common attacks,such as Distributed Denial of Service and SQL injection,are increasing.An intrusion detection system(IDS)is one of the critical technologies to maintain network security.Based on machine learning and deep learning techniques,IDS effectively detects abnormal network behaviour.However,the challenge of data imbalance in intrusion detection still affects the classification performance of intrusion detection systems.The imbalanced data used in existing detection methods will lead to overfitting,which reduces the detection rate on attack samples.To solve the imbalance problem in the intrusion detection domain,the paper proposes a clustering and instance hardness-based oversampling method.This method first pre-processes the input traffic data,calculates the proportion of majority class samples in the nearest neighbour samples for minority data and take the result as the hardness value,then clusters the minority data.Secondly,the statistical optimal allocation method is used to calculate the amount of data generated in each cluster.The ‘safe’area is divided using the hardness value in each cluster.Finally,new samples are created by interpolation within the area.This method generates synthetic data and aims to deal with the imbalance problem at the data level.The paper also proposes a classification method based on the ensemble of unsupervised learning techniques to solve the imbalance problem in intrusion detection.Firstly,the input data is preprocessed.Secondly,a correlation distance matrix is constructed for the features of the processed data,and the features are divided into several groups by clustering.Thirdly,lightweight autoencoders are constructed.All autoencoders adopt a three-layer neural network structure and use the Sigmoid function to activate the neurons of each layer.The feature groups are used to train autoencoders,respectively.After that,the reconstruction errors of all autoencoders are calculated by reconstructing the input data.Finally,the Isolation Forest algorithm is used for the classification based on these errors.The algorithm-level method aims to counter the imbalance problem through unsupervised methods such as autoencoder.Experimental results show the proposed oversampling method has a better generalization ability and classification accuracy compared with other sampling methods,and the method can be well applied to the area of intrusion detection.The results also show that the proposed classification method achieves a higher detection rate and consumes less time than other methods based on unsupervised learning.

Keywords/Search Tags:

Intrusion detection, Imbalanced data, Oversampling, Autoencoder

PDF Full Text Request

Related items

1	Research And Application Of Imbalanced Data Classification Based On Oversampling Algorithm
2	Research On Fast Network Intrusion Detection Methods Under Imbalanced Data
3	Research On Intrusion Detection Model Based On Feature Selection And Machine Learning Algorithm
4	Research On Imbalanced Data Classification Method Based On Generation Model And Its Application
5	Research Of Imbalanced Data Ensemble Classification Algorithm Based On Oversampling
6	Research On Cover-based Algorithms For Oversampling On Imbalanced Data
7	Research On Under-sampling Algorithm For Imbalanced Data Based On Clustering And Its Application
8	Research On Intrusion Detection Model For Imbalanced Dataset
9	Intrusion Detection System Based On Autoencoder And Convolutional Neural Network
10	Research On Network Intrusion Detection Method For Class Imbalanced Data