Font Size: a A A

Research On Classification Algorithm Based On Cost-sensitive For Intrusion Detection

Posted on:2006-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:H A KangFull Text:PDF
GTID:2168360155974267Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Intrusion detection is a kind of supplement for traditional network security defence mechanism. It has become one of important research direction in this field since raising network and systematic security protect ability availably. To solve the problem of low adaptability , false alarm, false positive, data overload ,data mining has been applied to intrusion detection field. Data mining is a direct application of machine learning techniques to automatically compute models from data warehouse in a dynamically changing environment. Two of the most important goals in machine learning are accuracy and training efficiency. So a number of simplifying assumptions have been made in prior work: one of them is that all features in a dataset are freely acquired with no computational ormonetary costs. Another one is that each data in a learning set is considered equally important and thus uniform costs are assumed. These are unrealistic for many applications. In intrusion detection, various feature calculations and been misclassified incur significantly different cost, thus only a light weight model that predicts the labels of incoming connections is useful.The article will concentrate on the problems of cost-sensitivity. Since intrusion detection is concerned with mass data and its pattern also changes, it has strict requirement for feature cost and misclassification cost. The article is interested in looking for generaK algorithm-independent solutions to these problems that will allow a wide range of existing inductive learning algorithms to be plugged in with ease. So the thesis has chosen to apply ensembles of classifiers.The article introduces an n-step sequential ensemble approach to reduce operational cost in real-time model exploitation. In this model low cost classifiers are always applied first and expensive classifiers are employed only if less expensive classifiers fail to deliver accurate predictions. The outcome is that the expectedoperational cost is significantly less than that of a monolithic classifier built from all available features, but the prediction accuracy remains unaffected.Misclassification cost is the main work of the article. Adaptive Boosting algorithm is an iterative machine learning procedure that successively generates base classifiers for classifying a weighted-version of sample, and then re-weights the sample dependent on how successful the classification was. The main idea of this algorithm is to form a strong learner by a simple combination of weak learners in order to improve recognition rate. The difference between Adaptive Boosting algorithm and the optimized method is the additional misclassification cost factor for every input sample and cost adjustment function in the weight-updating rule. Cost factor of sample become direct ratio with the misclassification cost. The updating rule increases the weights of costly wrong classifications more aggressively. In this way, each iterative weak classifier correctly predicts more expensive examples for such a distribution. The final voted ensemble will predict correctly more costly instances.The optimized method has little change in classification accuracy, but during the course of combining classifiers, weight of weak learners can be chosen to minimize the upper bound of the cumulative misclassification cost and through emulating experiment have observed that the optimized method shows a significant reduction in misclassification cost over adaptive boosting algorithm with fewer rounds of boosting.
Keywords/Search Tags:Intrusion detection, Classification, Machine learning, Cumulative misclassification cost, Weight updating
PDF Full Text Request
Related items