Font Size: a A A

Research On Intrusion Detection Methods Based On Feature Selection And Ensemble Learning

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2428330626466134Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Intrusion detection is a hot area of network security research.It is an effective measure to prevent host and network attacks.It makes up for the lack of security protection of traditional firewall technology,signature authentication technology and access control technology.Traditional intrusion detection technology uses artificially constructed rule bases to identify data.In this case,the workload is complicated and the detection rate is not high.The emergence of machine learning has provided a new solution for intrusion detection.The intrusion detection problem can be regarded as a classification problem in machine learning.By using the intrusion detection training set to build a special classification model,and then use this classification model to classify the new data.However,there are also some problems in the construction of intrusion detection models.The distribution of different types of data samples in the training set of intrusion detection is uneven,and it is difficult to accurately classify a small number of samples based on the built model.In addition,the feature dimension of the data in intrusion detection is relatively high,and there is redundancy,which affects the accuracy and efficiency of model classification.In view of these problems,the main research contents of this article are as follows:(1)The intrusion detection benchmark data set KDDcup99 was processed to remove redundant data,which greatly reduced the size of the data and shortened the difference in the number of samples between different categories.Then based on this,a detailed statistical analysis is performed on the value of each feature.Find those features that take a fixed value ratio of more than 99.9%,and analyze the distribution of the remaining 0.01% of the values among different categories,find out those features that have less influence,and 5 features in the training set have been removed before model construction At the same time,a grid search was performed on the hyperparameters of the basic classifier used in the article,and some good combinations of parameters was found.(2)A feature selection technique based on mutual information and firefly algorithm is proposed.Firstly,the mutual information between each feature and the category label is calculated while reducing the information redundancy between features as much as possible.The features are sorted according to the magnitude of the mutual information.Then use the improved firefly algorithm to search the best feature subset in the original feature space.The best features in each iteration are increased and decreased according to their importance.Finally,the two subsets obtained from the mutual information and the firefly algorithm are merged through a certain strategy to obtain the final feature subset.It is verified by experiments that the feature selection method proposed in this paper has improved the classification model.(3)Two different ensemble learning methods are proposed to solve the problem of data imbalance in intrusion detection.One is a Bagging integration method based on restrictedrandom sampling and feature selection.It uses a decision tree as a basic classifier.In order to ensure the diversity of the basic model and the balance of data between different categories,a restricted random sampling strategy is proposed to extract the training subset.At the same time,a semi-random strategy based on the best feature subset is also adopted for the feature subset used by each basic classifier.The other is a Stacking integration method based on improved clustering under-sampling.Here is mainly to improve the original clustering-based under-sampling technology.Reduce the amount of data in most classes while keeping the original data distribution of each class as much as possible.The experiments verify that the two ensemble learning methods proposed above improve the accuracy of a few classes.
Keywords/Search Tags:Intrusion detection, Basic classifier, Feature selection, Ensemble learning
PDF Full Text Request
Related items