Font Size: a A A

Research On Feature Selection Approach And Its Application In Network Traffic Identification

Posted on:2013-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:F H YangFull Text:PDF
GTID:2218330371957535Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Feature selection is to select the feature subset from the original feature set so that the specific evaluation criteria would be optimal. Feature selection, one of the hotspots in the field of information is an important technology in data mining, machine learning and pattern recognition. The emergence of high-dimensional datasets poses severe challenges to the existing feature selection and machine learning algorithms. Feature selection is applied in the field of network traffic classification because hundreds of the flow characteristics.This thesis firstly reviews the basic knowledge of feature selection, and introduces two typical feature selection algorithms, they are, ReliefF algorithm and Mutual Information measure. In order to fully exploit the advantages of both and considering that set a reasonable threshold is very important for the performance of the algorithm. This paper proposes a feature selection algorithm for multiple classes called RF-MI. RF-MI algorithm obtains an optimal feature subset by excluding irrelevant and redundant features from original all features based on ReliefF and MI measure, adjusting the feature weight threshold and the correlation threshold by the classification performance of a classification algorithm, and repeating above procedures until the best classification performance is got. Experiments show that the presented RF-MI algorithm is better than other algorithms in minishing the feature set with better classification accuracy maintained.Feature selection can be broadly divided into two categories: the filter model and the wrapper model. The filter model runs fast and the wrapper model can give better results. In order to fully exploit the advantages of both, the thesis proposes a feature selection algorithm based on mutual information and genetic algorithm, that is, ISU-GA algorithm. Experiments conducted on UCI data sets show that the algorithm has good comprehensive performance with respects to accuracy, size of feature subsets and efficiency.Finally, feature selection algorithms RF-MI and ISU-GA are applied in the field of network traffic classification. Experiments conducted on Trace Andrew data sets show that the algorithms can decrease the number of features significantly without impairing the classification accuracy by setting the threshold adaptively and reasonably. Since less features means lower time and space complexity for classification models, it indicates that RF-MI algorithm and ISU-GA algorithm are effective and feasible. Considering the classification accuracy and the efficiency of reducing the feature dimension, the classification performance of the ISU-GA algorithm is better, but its running time is still higher.
Keywords/Search Tags:Feature Selection, ReliefF, Mutual Information, Genetic Algorithm, Supervised Learning, Traffic Identification
PDF Full Text Request
Related items