Font Size: a A A

The Application Of Rare Class Classification Algorithm In Intrusion Detection

Posted on:2011-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y GuFull Text:PDF
GTID:2178360305971643Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of network information technology, the security of network data information increasingly becomes a challenging problem which the people of the technological society must face and proposal, so research and implement of intrusion detection system turns to be an important subject in the computer and its application field. But intrusion detection data set, which has its problem of class distribution imbalance, could not gain good classification result using classification algorithm of intrusion detection. So it is necessary that new classification strategies and assessment methods is introduced in order to deal with the problem of class imbalance. Data mining, which can analyze and process huge data automatically and efficiently,and mine for interesting information of latent regulation, rule and pattern and so on. Rare-class is a rising course in data mining. Its meaning is that we must distinguish rare-class goal, which has little proportion, from huge data and analyze its rule and pattern. So the intrusion detection data set could be regarded as rare-class in order to analyze and dispose using specific rare-class classification methods, and finally gain predicted function.Using precision, recall and their tradeoff F-value as criterions to evaluate performance of a classifier of rare-class classification is widely used by researchers of learning imbalanced data set problem. There are several main strategies for solving rare-class classification including adjusting class distributions with sampling technique, two phases learning method, emerging patterns in rare-class classification algorithm, cost-sensitive learning method and integration of classification technique and so on.In this article, using na?ve bayes based cost-sensitive algorithm, decision-tree algorithm, classification algorithm based rule and proposed integration classification algorithm to analyze and compare. Naive Bayes algorithm, based on class condition independent assumption, has simple and effective features. When class condition independent assumption does not hold in the circumstances, the classification accuracy rate is still equivalent or even better compared with classical decision-tree algorithm, C4.5. And na?ve bayes is still valid when class condition independent assumption is not established, because the distribution of dependent relations, rather than dependent relations itself between the property, determine classifier performance in classification problems. Therefore, this experiment used the integration classification algorithm uses na?ve bayes to construct the basic classifier.Firstly, we adopt synthetic minority over-sampling technique to preprocess imbalance data set in this paper then use the above-mentioned classification algorithm to deal with data and gain prediction function and receiver operating characteristic curve. Na?ve bayes based cost-sensitive learning and boosting ensemble is an integration of algorithm. It roots integration of integration classification technique in rare-class, using na?ve bayes to establish basic classifiers, in the same time adopt boosting technique in several rounds of independent study, and combine cost-sensitive learning method in learning process. The last, we use na?ve bayes algorithm based cost-sensitive learning, classification algorithm based rule,decision tree algorithm and naive bayes based cost-sensitive learning and boosting ensemble algorithm to take test in the intrusion detection data set KDDCUP'99 from UCI learning library. The experiment shows that the ensemble algorithm can gain relative optimized result for rare-class classification problems with less computation cost and demonstrate the character of the calculation of the efficiency, adaptive learning and classification of the effectiveness in the actual rare-class classification.
Keywords/Search Tags:data mining, rare-class, classification algorithm, na(?)ve bayes, intrusion detection system
PDF Full Text Request
Related items