Font Size: a A A

Prediction Of Dihydrofolate Reductase Inhibitors Activity Using Machine Learning Methods

Posted on:2008-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:X M ChenFull Text:PDF
GTID:2144360242463999Subject:Chemical Biology
Abstract/Summary:PDF Full Text Request
In this dissertation, machine-learning methods were employed to model the discrimination between the activity and inactivity for chemical compounds. The dissertation consists of two parts. The first section described the fundamental of machine learning methods. The second section was the prediction of the bioactivity of dihydrofolate reductase (DHFR) inhibitors by machine learning methods. In the first chapter, an introduction to Computer-Aided Drug Design was given. Theories of several machine learning methods, including Support Vector Machine (SVM), Artifical Neural Network (ANN), Logistic Regression (LR) and K-Nearest Neighbor classification (K-NN) were described. The descriptors encoding structural and physicochemical properties of molecules were defined and listed. The principle of methods used for designing the training set, including Random selection,Kohonen self-organising maps and Kennard-Stone method, and several methods used in the process of feature selection, such as Metropolis Monte Carlo algorithm and genetic algorithm, were narrated. Finally, the standard to estimate models was given.In the second chapter, Machine learning methods, including SVM, ANN, LR and K-NN, were used to develop the classification models. A total of 463 descriptors were calculated to characterize the structural and physicochemical properties for each of the 761 DHFR inhibitors. Comparing the results of different training set methods, including Random selection method and Kennard-Stone method, we found that Kennard-Stone method was better than Random selection method. Metropolis Monte Carlo simulated method was used for feature selection. It was shown that SVM method outperforms other machine learning methods used in this study and the final SVM model after feature selection could give a prediction accuracy of 91.62%. This suggests that SVM method with proper training set design and feature selection was potentially useful for the prediction of the activity of a diversity set of DHFR inhibitors.Our investigation indicated the potential of SVM in facilitating the prediction of the bioactivity of DHFR inhibitors. Our work suggested that the proper design of training set by Kennard-Stone method and the feature selection by Metropolis Monte Carlo simulated method could improve the performance.
Keywords/Search Tags:Support Vector Machines, Machine Learning Methods, Molecular Descriptors, Feature Selection, Training Set Design, DHFR Inhibitors
PDF Full Text Request
Related items