Font Size: a A A

Research On Feature Selection Method Based On Information Entropy And Iterative SVM

Posted on:2020-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:L M ChaoFull Text:PDF
GTID:2428330575462057Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification model is a commonly used model in machine learning.It is the most important part of intrusion detection systems.Due to the rapid development of the Internet industry in recent years,the magnitude of safety data has produced exponential growth,so new requirements for classification models have been put forward in terms of accuracy and real-time.How to make the model simple while ensuring accuracy is the focus of research in the field of safety analysis.The feature selection method studies how to select feature subsets from feature sets,and the selected features are important features for the classification problem.The feature selection method can effectively remove redundant features in the security analysis data set that are not related to the purpose of security analysis,so that the subsequent analysis model becomes simple and efficient while avoiding the occurrence of over-fitting.Therefore,based on the above ideas,this paper studies a feature subset selection method.The main work of this paper for the above purposes is as follows:Firstly,a method for calculating the entropy value based on information entropy theory and fuzzy set knowledge is proposed.Since the entropy can well indicate the uncertainty of the variable,the entropy value of each feature on the training data set indicates its impact on the uncertainty of the classification problem.Based on this idea,the method proposed in this paper can calculate the feature importance ranking matrix.It is runs in the pre-processing stage,and the feature ranking can be given before the classification model is built.Secondly,a feature subset selection method based on iterative support vector machine(SVM)is proposed.This method iteratively learns the support vector machine model.At the beginning,the feature subset is an empty set.At each step of the iteration,a feature is selected.Among the subsets,the feature selection is based on the feature importance evaluation method proposed in the first part and the influence of the feature on the SVM classifier objective function.The iterative process continues until the accuracy of the SVM model on the test set is no longer improved.The selected feature subset is the most efficient set of features for the classification problem.Finally,the feature selection method proposed in this paper is compared with other classical methods on the intrusion detection system(IDS)data set UNSW-NB15 data set.The experimental results show the method proposed in this paper reduces the complexity of the model while improving the accuracy.
Keywords/Search Tags:Cyber Security, Intrusion Detection System, Information Entropy, Iterative Support Vector Machines
PDF Full Text Request
Related items