Font Size: a A A

Ensemble Feature Selection Based On Evidence Accumulation And Its Application In Network Traffic Analysis

Posted on:2024-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:T WuFull Text:PDF
GTID:2568306935499704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important data analysis method,feature selection faces a huge challenge that no single feature selection method can effectively deal with various data sets for all real cases.Numerous studies have shown that ensemble learning is a potential promising solution to overcome the shortcomings of a single feature selection method and achieve higher and more stable performance.However,the current ensemble feature selection method usually has the following limitations.(1)The relationship between features has not been deeply explored,which may result in ignoring the features that are strongly related to important features.(2)The consensus strategy for ensemble feature selection mostly uses simple voting,which may miss some important features with minor votes.(3)Most feature importance evaluation methods calculate the feature weight according to the number of votes of the base feature selector.These methods one-sided evaluate the importance of features and are prone to output biased results with significant deviations.To address the above challenging issues,this thesis proposes an ensemble feature selection method based on enhanced co-association matrix.Although the traditional co-association matrix used in evidence accumulation is reported to be used in clustering ensembles,no research work has been reported to apply it for ensemble feature selection as far as we know,because the traditional co-association matrix cannot be directly applied to the selection of integration features.However,we have found the potential value of the co-association matrix and redesigned it for the application for ensemble feature selection.The main contributions of this thesis are summarized as follows.(1)An ensemble feature selection based on enhanced co-association matrix is proposed(ECM-EFS).Three fine-grained types are proposed,namely positive-co-association matrix(PCM),negative-co-association matrix(NCM),and relative-co-association matrix(RCM),which can not only enrich the information of the traditional co-association matrix,but also effectively reveal the relative relationship between features.14 data sets from UCI are used to verify the stability and robustness of ECM-EFS.In addition,comparing ECM-EFS with five state-of-the-art ensemble feature selection algorithms,a great deal of experiments have proved the effectiveness of ECM-EFS.(2)A new consensus strategy based on enhanced co-association matrix is proposed.Instead of simple voting,our method fully considers all the results given by the base feature selectors,the importance of features and the relationship between features,and then gives the final feature selection results.(3)A notion of Feature Kernel,which contains the most important features also serves as the starting point for feature selection,is introduced.An algorithm is also presented to construct the feature kernel and select subsequent features starting from the feature kernel.(4)Applying ECM-EFS to real network traffic data involves network behavior scenarios including mining traffic identification,video traffic identification,network application identification,etc.Experimental results show that ECM-EFS has more significant performance than other ensemble feature selection methods.In addition,the feature kernel shows better performance in high-dimensional network traffic data.
Keywords/Search Tags:ensemble feature selection, evidence accumulation, feature kernel, relativeco-association matrix(RCM)
PDF Full Text Request
Related items