Font Size: a A A

Study On The Correlation Between Molecular Descriptors And QSAR Model Prediction

Posted on:2017-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:X F PanFull Text:PDF
GTID:2348330485458356Subject:Engineering
Abstract/Summary:PDF Full Text Request
Intelligent methods are often used to build models for chemical databases, and the performance of the model is the most concerned problem, which has attracted a lot of attentions. Usually, the accuracy of the model is affected by many factors, such as data types, the size of data, the choice of modeling method, screening of the characteristics of the input and the relationship between the features. It's also shown that no single machine learning method can always perform better on all databases than others. The types of research data and data characteristics and the model of the establishment of the fitness of the model and performance are affected, so through the similar database to understand the performance of intelligent computing model algorithm is helpful to the application and improvement of the algorithm. When constructing a single machine learning model, some factors need to consider for obtaining optimal models. Among these factors, the descriptor selection is with large uncertainty due to its complication, while the relations between the input descriptors (redundancy) and the precision of descriptor effects on models are likely to be understood easily.Thus in this paper, we study the accuracy and redundancy of the molecular descriptors in the influence of performance to QSAR calibration model, which was built by Intelligent computing method in chemical molecules databases. The study of descriptors accuracy influence is based on the different accuracy quantum chemical molecular descriptors by different density functional methods calculated. The selected molecular descriptors and three regression methods (Support Vector Machine, SVM.; Extreme Learning Machine, ELM; Random Forest, RF) were used to modeling analysis. Results show the correlation between molecular descriptors and target values impacts the predictive performance of the regression model, the higher correlation between molecular descriptors and target, the better predictive performance of similar QSAR model.In the study of the correlation(redundancy) in the features for the influence of regression model,Two ways were used to change the number and type of input descriptors in order to change the redundancy of molecular descriptors, that is, the addition and substitution of Similar descriptors as the input of regression model. The results show that the correlation(redundancy) in features has little influence on the prediction performance of chemical calculation correction model, the high and low of redundancy, model prediction performance has good results. The result shows that, the correlation between molecular descriptors and target values can be improved, which can improve the predictive performance of the regression models built in chemical databases. So as the standard to select quantum chemistry calculation method of molecular descriptor; In such a QSAR calibration model, the correlation in molecular descriptors may not have to give too much attention.
Keywords/Search Tags:Intelligent method, Descriptor accuracy, Redundancy, Support vector machine, Extreme learning machine, Random forest
PDF Full Text Request
Related items