| The rapid development of information and Internet technology brings many conveniences to people’s life,but also brings a lot of problems about information security.The problems information security will affect the individuals or enterprises,and even bring threats and losses varying degrees to the country.Effective security measures are needed to ensure the security of network and information,and intrusion detection based on data mining technology has become a key part of network security mechanisms,which attracts extensive attention of scholars.During the research process of intrusion detection system,with the increasing scale and complexity of data,there is the problem of imbalanced distribution of data.How to effectively deal with imbalanced data problems has become a challenge for the researchers in intrusion detection system.This thesis focuses on the imbalance problem of intrusion detection system.To effectively improve the detection performance of detection system for sparse data,two new intrusion detection models are proposed.The first model FSVMs,is based on a sampling algorithm and fuzzy support vector machine,called FSVMs model,and the second is based on an ensemble feature selection algorithm and multiple classifiers,denoted as F_SDK model.In order to improving the classification performance of intrusion detection model for the minority class samples,new intrusion detection model(FSVMs model)is built through the studies of a variety of data processing methods,combining the sampling algorithm,semi-supervised method and fuzzy support vector machine.The model first uses synthetic minority class sampling technique to pre-process datasets,enabling model to learn more information from minority class,then inputting the processed data into semi-supervised support vector machine classifier based on fuzzy theory for training.In order to evaluate the detection performance of FSVMs model,we conducted extensive experiments on ten imbalanced subsets from KDDCup99 and NSL-KDD datasets.And six performance indicators,including recall,accuracy,precision,false positive rate,Fscore and G-mean,are used as evaluation indexes.The experimental results show that FSVMs model can improve the detection performance of intrusion detection system,and it has good detection performance for two attack types(U2R and R2L)of sparse distribution,especially.A new intrusion detection model(F_SDK model)is built based on the research of multiple feature selection algorithms,integration method and the classifier learning theories.Firstly,the model uses integrated feature selection algorithm to extract the important attributes of the dataset,which is based on two feature selection methods,correlation based method and mutual information based method,to select the most valuable feature subset.Then the model trains multiple classifiers on processed data,gets several different prediction labels for each sample,and determines the final category label based on the principle of majority voting.In order to evaluate the detection performance of F_SDK model,this experiment are performed on NSL-KDD dataset,and use five measures including accuracy,precision,recall rate,F-score and G-mean to evaluate the results.The experimental results show that F_SDK model can solve the problem of imbalanced data classification effectively and enhance the detection performance of the intrusion detection system for two sparse attack types(U2R and R2L). |