Font Size: a A A

Study For Feature Selection Algorithm Based On Maximum Information Coefficient And Redundancy Sharing And MIC Optimization Algorithm

Posted on:2021-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:J J YangFull Text:PDF
GTID:2480306518989849Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Feature selection is a key link in machine learning that can effectively improve model training efficiency,improve prediction accuracy,and enhance model interpretability.The classical feature selection method Minimal Redundancy Maximal Relevance(mRMR),which considers both the correlation between features and target variables and the redundancy between features,has a strong robustness and is widely used.However,the method is not comparable to the redundancy measure,the introduction of features after sorting cannot be automatically terminated one by one,and the redundant features are directly removed and other drawbacks.In this study,we introduce the Maximum Information Coefficient(MIC)measure that can pervasively capture linear and nonlinear correlations and replace the correlation measure with the redundancy measure in the mRMR algorithm,solving the defect that the mRMR correlation measure is not comparable to the redundancy measure.The new algorithm can automatically terminate the feature selection based on the highest MIC-share score principle without the use of predictive models,effectively improving the efficiency of feature selection.Simulation experiments using two regression data(Friedman and Housing)and two categorical data(Breast and Sonar)as examples were conducted to verify the validity of the support vector machine prediction model.Replacing the correlation and redundancy measures in the mRMR algorithm with MIC scores addresses the incomparability of these two measures and has the advantage of capturing both linear and nonlinear relationships.However,the MIC estimation algorithm Approx Max MI suffers from statistical low and inaccurate estimation of MIC values due to its empirically given maximum grid constraint criterion(B(n)).In this study,a new MIC estimation algorithm,OIC(Optimal Information Coefficient),is proposed based on the cardside independence test,which automatically constrains the number of segments in the direction of homogeneity and the direction of optimization,and automatically terminates the nonsignificant grid division.The MIC values estimated by the OIC algorithm are validated by the simulation data to be more in line with their theoretical range [0,1] and have higher statistical potential and computational efficiency.Three real-life use cases also validate its effectiveness.
Keywords/Search Tags:Feature Selection, Correlation Measure, Minimum Redundancy Maximum Rorrelation, Maximum Information Factor, Redundancy Sharing, ?~2-test
PDF Full Text Request
Related items