Font Size: a A A

Research On Intrusion Detection Scheme Based On Semi-supervised And Feature Selection

Posted on:2020-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:K ShiFull Text:PDF
GTID:2428330590996459Subject:Information security
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet technology and the scale of the Internet with the massive data generated,the Internet and Internet industries have become an indispensable part of people's production and life.However,people enjoy the convenience of the Internet,they also suffer from cyber security threats caused by the Internet.Aiming at the complex and diverse network attack technology,the extensive research has applied machine learning technology to intrusion detection systems to achieve the goal of proactively ensuring network security.There are two problems in the intrusion detection technology based on machine learning: 1)In order to obtain an intrusion detection system with strong generalization ability and accurate detection,a large amount of tagged data is needed.However,in a big data environment,there is often no such relatively large amount of tagged data.If only relying on expert annotation,the detection efficiency will be greatly reduced;2)the importance of different features in network data is not the same,and some Even reduce detection efficiency or accuracy.There is still an effective solution to the lack of feature selection in semi-supervised learning,which poses an obstacle to the efficiency and accuracy of intrusion detection.The main research of this paper consists of the following two parts:The first part,firstly,explains the role and definition of feature selection.Secondly,in order to quasi-deterministic classification of different features and quantitative features,the characteristics and defects of typical feature selection algorithms are compared and compared.Higher computational efficiency,maximum correlation and minimum redundancy feature selection can better balance the classification effect and the number of features.Finally,the weighted Euclidean distance is introduced based on the maximum correlation feature selection method.Effectively measure the purpose of redundancy.In order to give guidance to machine learning classification methods and optimize semi-supervised classifier performance,the characteristics and defects of unsupervised and supervised learning algorithms are analyzed,and the semi-supervised algorithm based on CPLE is deeply understood.The following research conclusions are obtained: 1)Regardless of how the network attack changes,the actual response will be in the underlying network data;2)Semi-supervised can show better classification results under a small amount of labeled data,thus reflecting better detection accuracy rate.In the second part,firstly,this paper analyzes the characteristics and defects of unsupervised and supervised learning algorithms,and obtains the advantages and characteristics of semi-supervised learning through comparison.Secondly,the semi-supervised algorithm based on CPLE is introduced.The algorithm utilizes the discriminative model form of semi-supervised learning and iteratively through the new EM algorithm,so as to effectively learn and judge the new data set.Finally,an intrusion detection scheme based on N-mRMD and CPLE semi-supervised is proposed.In the third part,firstly,using the NSL-KDD standard dataset,it is verified that the scheme can effectively select important features,improve the detection efficiency and indirectly improve the detection accuracy.Secondly,it can also utilize a small amount of labeled data in the new dataset.A large amount of unmarked data is marked to improve the accuracy of detection;finally,the performance of the system is evaluated by introducing various indicators to evaluate the scheme 's detection effect.
Keywords/Search Tags:Intrusion detection, feature selection, semi-supervised learning, maximum correlation, minimum redundancy, Weighted Euclidean distance
PDF Full Text Request
Related items