Font Size: a A A

A Semi-supervised Intrusion Detection Algorithm Based On Natural Neighbor

Posted on:2017-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q FangFull Text:PDF
GTID:2348330503465638Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Intrusion detection system is a new generation of safety precautions after firewall technology. It finds abnormal behavior and raises alarm by gathering and analyzing information from key node of computer system or computer network.The main method of traditional intrusion detection algorithm is based on supervised learning and unsupervised learning. Although the detection rate of intrusion detection algorithm based on supervised learning is high, the cost is also high because the training sample is hard to acquire and the building of training set depends on experts of information security. Although intrusion detection algorithms based on unsupervised learning do not need to build training sets, the detection rate is obviously lower than algorithms based on supervised learning. In fact, there are not only a lot of unlabeled data in network, but also some labeled data. We can use these labeled data to do intrusion detection based on semi-supervised learning so that the detection rate will be promoted because the algorithm can make the best of these labeled data.Usually, we need to set number of clusters when use clustering method to do semi-supervised intrusion detection. The parameter is hard to choose, and it often depends on a great deal of experiments and users' experience from experiments. Natural Neighbor(2N) is a new concept of neighbor. The algorithm of searching natural neighbor needs no parameters. Natural neighbors of a data can generate adaptively.This paper combines semi-supervised learning with natural neighbor, and proposes a semi-supervised intrusion detection algorithm based on natural neighbor(SID2N). First, do clustering based on natural neighbor respectively according to attack type of labeled data. Then, calculate center of every cluster to be training samples of later classifier. At last, make classification based on natural neighbor on unlabeled data. The advantage of the algorithm is that it not only fully learns information in labeled data, but also needs no parameters. The algorithm is totally adaptive.The experimental data set of this paper consists of 19999 records from Corrected set in KDD CUP99. First, we do numeralization, standardization and normalization on dataset and choose 15 features from 41 dimensions according to the result of SPSS and information gain. We label some data in the set and make comparisons of effects between semi-supervised intrusion detection algorithm based on natural neighbor and SAID semi-supervised intrusion detection algorithm. The result shows that the semi-supervised intrusion detection algorithm based on natural neighbor has advantage no matter in detection rate, false positive rate or missed detection rate. This experiment verifies the effectiveness of the algorithm. In addition to it, we change proportion of labeled data. We label 1/5, 1/4 and 1/3 data in the set, and do experiment using semi-supervised intrusion detection algorithm based on natural neighbor. The result shows that the variation of detection rate, false positive rate, missed detection rate and detection precision is not obvious. This experiment verifies the stability of the algorithm.
Keywords/Search Tags:natural neighbor, semi-supervised learning, intrusion detection
PDF Full Text Request
Related items