Font Size: a A A

Prediction Of Antifungal Activity And Human Intestinal Absorption Of Drugs By Using Support Vector Machine

Posted on:2006-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:S W ChenFull Text:PDF
GTID:2144360155465397Subject:Applied Chemistry
Abstract/Summary:PDF Full Text Request
In this dissertation, Support Vector Machine (SVM) classification method was employed to model the discrimination between the activity and inactivity for chemical compounds. The dissertation consists of two parts. The first section described the fundamental of SVM and other related methods. The second section, including chapter two and chapter three, was the application of SVM classification method to antifungal compounds and human intestinal absorption compounds. In the first chapter, the theories of SVM and other classification methods, including k-nearest neighbor (KNN) and C4.5 decision tree, are given. The name of descriptors that calculate to encode structural and physicochemical properties of molecules were listed and a series molecular shape descriptors defined by our research group were introduced. The principle of genetic algorithm combined with SVM is narrated. In the second chapter, a total of 67 descriptors were calculated to characterize the structural and physicochemical properties for 94 chemical compounds, including 42 antifungal active compouds and 52 inactive compounds. SVM classification method was employed to model the discrimination between the antifungal activity and inactivity for these compounds. Leave-one-out (LOO) cross-validation method was used to optimize the SVM model and a genetic algorithm was used in variables selection, which reduced the number of molecular descriptors from 67 to 30. 5-fold cross-validation method and an independent evaluation set method were used to test SVM model, where the training sets were effectively and evenly chosen in the descriptors space by clustering based on their chemical similarity, and both of the test methods gave consistent results with the LOO method. Comparison of SVM method with other statistical classification method, including k-nearest neighbor (KNN) and C4.5 decision tree, using the same pre-selected molecular descriptors were also conducted. In the third chapter, the research method is based on chapter two and model is built to predict the activity of human intestinal absorption. In order to characterize the structural and physicochemical properties for 230 chemical compounds, Molecular descriptors are complemented from 67 to 102. 5-fold cross-validation method was used in variable selection by genetic algorithm, which reduced the number of molecular descriptors from 102 to 47. 5-fold cross-validation method and an independent evaluation set method were used to test SVM model, where the training sets were chosen in the descriptors space by C-means clustering, and both of the test methods gave consistent results. Our work suggests that a proper choice of training set by clustering for 5-fold cross-validation method or the independent test method can save time and the potential of SVM in facilitating the prediction of antifungal activity and human intestinal absorption activity. From the prediction result, the conclusion can be drawn that GAs is useful for removing redundant descriptors and helpful for the computational efficiency of statistical system. Our investigation indicates the potential of SVM in facilitating the prediction of drug activity.
Keywords/Search Tags:Support Vector Machines, Molecular Descriptors, Variables Selection, Training Set Design, Genetic Algorithm, Human Intestinal Absorption, Antifungal Activity
PDF Full Text Request
Related items