Font Size: a A A

Using A Weighted Network Graph Clustering And Subspace Ensemble Approach For High-dimension Data Classification

Posted on:2018-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:S N XieFull Text:PDF
GTID:2348330569486439Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Feature selection is an important preprocessing step in pattern recognition and machine learning.In recent years,the advances in technology have led to the rapid expansion of high-dimensional data.And as a result,irrelevant and redundant features have made the feature selection become a challenging task in efficiency and effectiveness.What's more,how to improve the stability and robustness under the condition of maintaining accuracy has become a serious problem.So,it is necessary to perform feature selection in order to eliminate irrelevant and redundant features,which can help to reduce execution time and improve prediction performance.To address these problems,a feature selection model based on weighted network graph clustering(WNGC)is proposed in the thesis.Then,the subspace ensemble which is based on accuracy and difference of base classifiers is used to improve the stability and robustness of WNGC to solve high-dimensional data classification problem.The main research work of the thesis is as follows:1.To deal with the high-dimensional data that includes irrelevant and redundant features,a feature selection method based on weighted network graph clustering is proposed(WNGC)in this thesis.Firstly,in order to remove the irrelevant features,the symmetrical uncertainty is employed to measure the correlation between the feature and label set.Secondly,the similarity of the remaining feature set is represented as a weighted network graph and a community detection algorithm is utilized for clustering features.Finally,for the purpose of selecting the final feature subset which is representative features,an iterative search strategy based on maximum information and minimum redundancy is proposed.The WNGC is able to explore and exploit feature subset.2.To further improving the stability and robustness of feature selection,the thesis designs a subspace ensemble based on double weighted network graph clustering with ensemble(SE-DWNGC).Firstly,the training samples are perturbed several times.In order to construct different candidate features,the thesis uses the WNGC to select different samples.For purpose of selecting the feature subset which is composed of stability and relevant features,a rank aggregation method is combined to integrate all candidate features.Through this,it is able to reduce the search space,the impact of irrelevant and redundancy features on the classification performance and improve the efficiency of the model.Secondly,in order to get the feature subspace with large discriminative ability and maintain its difference,the WNGC grouped the stability feature subset again to classify the related feature pairs into the same clusters.Besides,some features from each group are selected randomly as one discriminative feature subspace.In order to improve the generalization performance,subspace selective ensemble based on double weighted network graph clustering(S-SW-DWNGC)is designed,and the accuracy and difference of base classifiers are employed to filter redundancy base classifiers.Finally,the majority voting method is adapted to classify the test samples.Moreover,the effectiveness of the proposed scheme is tested on UCI data,text data,and microarray data.The experimental results show that the proposed method is a competitive method which is capable of finding a smaller size of features than the state-of-the-art feature selection algorithms,and the execution time is considerably shortened whereas the performance is maintained or even improved.Besides,the method can achieve higher stability,and it is applicable to the selection of features in high-dimensional data.
Keywords/Search Tags:feature selection, weighted network graph, graph clustering, community detection, subspace ensemble
PDF Full Text Request
Related items