Font Size: a A A

Research On Integrated Feature Selection Method Based On Multiple Correlation Measures And Discriminant Structure Vecto

Posted on:2024-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Y RenFull Text:PDF
GTID:2568307097450304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Feature selection is a way to reduce data dimensionality,the purpose of which is to select representative features instead of high-dimensional features while improving the classification accuracy.It has been widely used in many important fields such as image classification,genetic testing and text mining.The filters mainly select features using metric of correlation,such as chi-square test,information gain,and so on.These measures usually calculate the correlation between feature and classes,and select features according to the relevance score,but most methods that use only one score to evaluate features tend to ignore the characteristics that have weak global classification ability but high classification ability for some categories.The feature selection based on the discriminant structure complementarity is to decompose the classification problem into multiple binary classification subproblems,uses a feature metric to measure the classification ability of the feature to each subproblem and gets the feature discriminant structure vector.Then,it is to remove irrelevant features on this basis,and use greedy strategy and discriminant structure complementarity to find and eliminate the redundant features.However,different measurement methods have their own advantages and disadvantages.Using ensemble ideas to fuse multiple measurement methods will combine the advantages of each to obtain a better subset of features.Therefore,this paper proposes an ensemble feature selection method(EFS-DSVC)based on feature discriminant structure vector complementarity.It integrates the classification abilities of the univariate method’s the symmetric uncertainty(SU),fisher discriminant ratio(FDR),and the multi-variable method’s Relief F,and combines the discriminant structure vector complementarity to eliminate the redundant features,so as to determine the feature subsets.Then take their union as the final feature subset.The experimental results show that this method can improve the classification performance of this method by comparing it with the existing feature selection method on 4 UCI datasets and 2 gene expression datasets,and also with its base feature selector.EFS-DSVC merges the outputs of the basic feature selectors,which may cause redundancy and excessive number of selected features.On this basis,this paper proposes an ensemble feature selection based on different metrics and improved aggregation strategies(EFSMMIAS).In this method,SU,FDR and Relief F are respectively combined with the discriminant structure vector complementary to obtain three feature subsets for re-aggregation,and the same classifier is used for classification prediction.The feature subset with the highest classification accuracy is selected as the optimal feature subset.Then select the features that can improve the classification accuracy in the other two subsets to the optimal subset.The EFSMMIAS is a classifier-dependent method,so k nearest neighbor,decision tree and random forest are selected to be combined with.In this paper,the three configuration schemes are compared with the traditional feature selection method and the base feature selection methods on 11 standard datasets,and the experimental results verify the effectiveness of the proposed method.
Keywords/Search Tags:ensemble feature selection, discriminant structure vector complementarity, feature subset, aggregation policy
PDF Full Text Request
Related items