Font Size: a A A

Research On Feature Selection And Ensemble Learning And Its Application On Intrusion Detection

Posted on:2009-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y X QiuFull Text:PDF
GTID:2178360245476390Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the emergence of high-dimensional data in machine learning domain, such as intrusion detection, the existing feature selection algorithms and machine learning algorithms have been facing serious challenge, so there is an urgent need for feature selection algorithms and machine learning algorithms which have better comprehensive performance, e.g., accuracy and efficiency and so on. In this thesis, some research on feature selection and ensemble learning and their applications in intrusion detection are carried out. Contributions in this thesis mainly include:1. Propose an evaluation method of feature subset based on margin. This method evaluates the superiority of feature subset by calculating the margin of feature subset induced by feature selection algorithms based on rough set. Experiments show that, in most cases ,the feature subset with larger margin has better or comparable classification performance than those with relatively small margin for all subsets with same cardinality.2. Propose a feature selection algorithm based on mixed discernibility matrix; By constructing mixed discernibility matrix, this algorithm can effectively overcome the disadvantage that the traditional rough set feature selection couldn't deal with the mixed data which contains discrete features and continuous features directly. Experimental results show that the classifiers developed by using the selected feature subsets have better or comparable performance than those generated by all features.3. Propose a new ensemble supervised learning algorithm following by feature selection that is suitable for high dimensional data. As this algorithm selects feature subset using evaluation criterion based on the margin, and constructs individual classifier with higher accuracy rate, it improves the performance of ensemble learning. Experiments show that the algorithm has higher classification performance than the traditional ensemble learning algorithm based on feature selection. In addition, Bagging techonology is applied into ensemble learning following by feature selection, this strategy improve effectively the performance of the classifier.4. Propose a semi-supervised learning algorithm based on ensemble learning. The algorithm labels higher confidence examples from unlabeled samples using classifier ensemble technology, hence which update the classification model, and also it solves the problem that how to guarantee detection efficiency under the condition of scarce training data in actual intrusion detection system. The experimental results show that the algorithm can effectively reduce the number of training samples, and can effectively improve the classification performance of intrusion detection system while reducing the false negative rate and the false positive rate.
Keywords/Search Tags:intrusion detection, feature selection, ensemble learning, mixed discernibility matrix, semi-supervised learning
PDF Full Text Request
Related items