Font Size: a A A

Chi-MIC-share Feature Selection Algorithm And Its Application In QSAR

Posted on:2021-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiFull Text:PDF
GTID:2480306518989779Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Quantitative Structure-Activity Relationship(QSAR)uses mathematical statistical analysis methods to establish a correlation model between compound structure and biological activity,quantitatively describe the relationship between the two,to predict various activities of new compounds and provide guidance for molecular design.Choosing appropriate feature selection methods and statistical models can help to improve QSAR model prediction accuracy and enhance its interpretability.The univariate filtering method uses correlation statistics to rank features without considering the redundancy between features;The multivariate filtering method,such as the commonly used minimal Redundancy Maximal Relevance(m RMR),considers the redundancy between features,but the direct de-redundancy often leads to a decrease in prediction accuracy.The above methods only rank features with importance and need cross-validation to search for a subset of features,which is time-consuming.Based on the improved algorithm of Maximum Information Coefficient(MIC),Chi-MIC,and redundant allocation strategy,this paper develops an automatic termination feature selection method Chi-MIC-share,which has low complexity,fast calculation speed,and is independent of learning machine.After the three QSAR datasets(Tetrahymena pyriformis,tadpole,and fathead minnows)were selected by Chi-MIC-share method,the independent prediction results of Support Vector Regression model(SVR)MSE are 0.0280,0.0226,0.0321,R2 are0.9590,0.9750,0.9367,respectively,which are superior to the reference methods,proving the effectiveness of the Chi-MIC-share method.Based on the above three datasets and Chi-MIC-share method,Multiple Linear Regression(MLR),Partial Least Squares Regression(PLS),Ridge Regression(Ridge),Random Forest(RF),Artificial Neural Network(ANN)models were used for modeling and predicting,and compared with the SVR model.The results show that the MLR and PLS linear models perform poorly,the SVR and RF models perform better than the Ridge and ANN models,and the SVR is superior to the RF model,indicating the robustness of the SVR model in the processing of small sample nonlinear data.The interaction describes the relationship between the overall effects of the system and its partial effects,and the combination of main effect features and interaction features can improve the prediction performance of the model.Based on the Abs feature interaction mode(Zij=|Xi-Xj|,that is the interaction feature Zij is converted from the features Xi and Xj),this article introduces a single interaction feature and multiple interaction features into the QSAR study.The experimental results show that the interaction The introduction of features can improve the prediction accuracy of the SVR model.The Chi-mic-share feature selection method proposed in this study,as well as attempts in model selection and feature interaction,may provide new ideas for quantitative research and have certain reference value.
Keywords/Search Tags:Quantitative structure-activity relationship, Feature selection, Maximum information coefficient, Redundant allocation
PDF Full Text Request
Related items