Font Size: a A A

Research On Feature Selection Method Based On Clustering Ensemble

Posted on:2022-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2518306554964679Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the amount of data that needs to be analyzed and processed is increasing exponentially.As an important part of data mining preprocessing technology,dimensionality reduction technology can effectively reduce the computational complexity of learning algorithms,making it possible for traditional learning algorithms to process large-scale data.Feature selection,as a kind of dimensionality reduction technology,is widely used in various fields because the data after dimensionality reduction is highly readable and does not change the data structure.The steps of this method are mainly divided into two parts,generating feature clustering results and feature selection after clustering.However,the following four problems still exist in cluster-based feature selection algorithms:(1)In such methods,the single clustering method usually used has the defects of robustness and low generalization ability.(2)In clustering ensemble,the negative influence of noise-based clustering on the result of clustering ensemble cannot be eliminated.(3)Also in the clustering integration,after the weight calculation,the problem of the similarity of the base cluster weights will appear,which cannot effectively distinguish the quality of the base clustering,which will affect the result of feature selection.(4)In the feature selection stage,the amount of feature information and the redundancy between features failed to be considered at the same time.In response to the above four questions,and based on the characteristics of the data set,this article has made the following work:(1)An unsupervised feature selection algorithm based on clustering ensemble guidance(Clustering Ensemble Guided Feature Selection,CEGFS)is proposed.First,according to the differences of different clustering methods on different data sets,a new adaptive weighted clustering ensemble algorithm(Adaptive Weighted Clustering Ensemble,AWCE)is designed to cluster features using group wisdom thinking.Then,by solving the problem of how to eliminate the redundancy between features and select the features with large amount of information,a central entropy method(Select the Feature with Centrality-Entropy Score,SFCES)combining information entropy and centrality is proposed to select the features after clustering integration.According to the 8 data sets provided by the UCI platform,the proposed algorithm is evaluated.The results show that the AWCE,SFCES and CEGFS algorithms improve the accuracy of clustering integration and feature selection.(2)An unsupervised feature selection(Internal Weighting Clustering Ensemble of Feature Selection,IWCEFS)based on clustering performance evaluation index integration is proposed.Firstly,the internal evaluation index is introduced to evaluate the clustering method,and the maximum internal validity vector is proposed to solve the impact of noise base clustering on the integration.Secondly,the weight of each base clustering is adjusted by the iterative method,so that the high quality base clustering results can get more weight,so as to increase the differentiation degree of good and bad clustering results.Then,according to the results of Clustering integration,high-quality pseudo labels can be obtained.This method is called(Clustering Ensemble with Adaptive Weight Learning,CEAWL).Finally,CEAWL and l2,1 sparse learning are combined for feature selection.It is also tested on 8 data sets,and the results show that this method can effectively improve the feature selection performance.
Keywords/Search Tags:Unsupervised Feature Selection, Similarity Evaluation, Clustering Ensemble, Adaptive Weight Adjustment Algorithm, Sparse Learning
PDF Full Text Request
Related items