Font Size: a A A

Computer-Aided Prediction Of Properties Of Drugs And Proteins

Posted on:2011-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L L XiFull Text:PDF
GTID:1100360305965952Subject:Chemical informatics
Abstract/Summary:PDF Full Text Request
Recently, with the development of combinatorial chemistry and high-throughput screening techniques, a vast of data related to chemistry, biology and drug are produced. However, the speed to obtain the molecular structures and the sequences/structures of biomacromolecule is much faster than that of the corresponding properties or function data, which has kept researchers from exploring knowledge to some extent. The computer-aided method to predict properties is a very effective approach.This dissertation is concerned to use the computer-aided method to study the properties of proteins and drugs, and to study the interacting mode between ligand and protein and related bioactivities. The purpose is to build accurate and fast predictive model using the known data to predict the properties of unknown samples; on the other hand, the purpose is that through exploring the developed predictive model, we hope to reveal the critical factors influencing the studied properties, which can provide some useful information to optimize samples. Finally, we expect the idea of the computer-aided method and the built models have its practical use, and to help screening the required molecules, saving experimental cost, improving the speed of screening and reducing experimental time.In Chapter 1, a brief introduction of principle of the computer-aided method was given. From acquisition of data, pre-processing of samples, characterization of studied samples, the development of a stable, reliable and predictive model to validation and assessment of model, all of these aspects of the computer-aided method were described in detail. In addition, we also introduced the principle of molecular docking method that studies the interacting mode between ligand-protein. Finally, the algorithms used in this dissertation were presented.In Chapter 2, we applied the computer-aided method to predict the properties of proteins. Concrete research content includes the two basic aspects of protein folding process:quantitatively predicting folding rates and recognizing the type of protein folding pathway. In the first work, our main purpose is to develop a general, fast and accurate model to predict the protein folding rates completely based on the information of the protein sequences. The information of amino acid sequence autocorrelation (AASA) was employed to represent 101 protein samples. Based on the significant features selected by genetic algorithem (GA), the global (multiple linear regression, MLR) and local (local lazy regression, LLR) methods were employed to develop prediction models for protein folding rates. The LLR method performed better than MLR. The three-fold, five-fold and ten-fold cross validation results showed that the local model was more robust and stable than the global one. Furthermore, we analyzed the significant features including unfolding entropy changes, hydrophobicity, secondary structure tendency and flexibility that have great effect on folding rates. In the second work, the same 101 protein sequences were employed, and the information of amino acid sequence autocorrelation (AASA) completely based on sequence was used to represent protein samples. Support vector machine-recursive feature elimination (SVM-RFE) was used to rank all the calculated features according to weight of support vectors. According to the results of leave-one-out validation method, least squares-support vector machines (LS-SVMs) was used to build classification model using toped seven features. The accuracy was 91.09%, and MCC value was 80.88%. The three-fold, five-fold and ten-fold cross validation results showed that the built classification model was stable, reliable and predictive. Additionally, we analyzed the significant features to reveal the factors influencing the type of protein folding kinetics pathway, and found out that amino acid properties, unfolding Gibbs free energy change, hydrophobicity, secondary structures and charge, play vital roles in the behavior of protein folding.In Chapter 3, we applied the computer-aided method to predict the interacting mode and interacting strength between ligand and protein. In the first work, the combined molecular modeling approach from the perspective of protein, ligand and their complex were employed to obtain some insights into the structure-activity relationship, interaction mode between protein and ligand of 58 novel gelatinases potent inhibitors. (1) Perspective of protein:sequence alignment and structure superimposition can provide better understanding of the binding site of proteins. (2) Perspective of inhibitors:the QSAR study of 58 inhibitors can give accurate prediction of activity and gain some insights into the structural features responsible for the activity. (3)Perspective of protein-ligand complex:molecular docking study was performed to identify the key residues and critical interactions between the ligands and proteins. This research strategy from multi-angles can provide more important information, and present a new way for the further design of new potent inhibitors. In the second work, a series of new inhibitors of MMP-13 were taken as the research object, and we focus on two important issues in QSAR study:the selection of active conformation and the characterization of samples. When the three-dimensional structure of MMP-13 is known, the accurate molecular docking program—Glide was employed to dock all the studied compounds into the active site of MMP-13, and then active conformation for each compound was obtained. In the section of characterization, structural descriptors and descriptors related to ADME were calculated, and the descriptors based on the docked ligand-protein complex conformation were also calculated to describe the interaction between ligand and protein. Genetic algorithm was used to select the important descriptors influencing on inhibitory activity, at the same time, MLR model (i.e. the global model) was constructed, and both internal and external validation showed the built model was stable and predictive. Considering the strength of the local model, LLR model was also developed. Compared with the global one, the local model can significantly improve the predictive ability.In Chapter 4, we applied the computer-aided method to predict the properties related to ADME/Tox of drugs. In the first work, CYP2C19 was taken as the research object. Based on the diversified structures of 7750 compounds, random forest (RF) was employed to develop a classification model to recognize the substrates of CYP2C19. Based on 6200 compounds in training set, RF selected 19 important descriptors and built classification model. Then, this model was performed to predict 1550 compounds in external test set, which showed the accuracy was 93.42%and MCC value was 80.36%. The developed model had higher classification speed and more accurate recognition rate, which can be applied to recognize substrates of CYP2C19 in early-stage of drug discovery. We expect that it can help to provide useful information from the level of theory for researchers, reduce the probability of drug-drug interaction caused by metabolism, and improve the effectiveness and safety of drugs. In the second work, based on the diversified structures of 947 compounds, SVM-RFE was used to rank all the calculated descriptors according to weight of support vectors. LS-SVMs algorithm was employed to build classification model to recognize the compound that induced hepatic injury. Based on 710 compounds in training set, according to the results of leave-one-out validation method, the toped fifteen descriptors was used to build LS-SVMs classification model, and the accuracy was 76.48%. For 237 compounds in external test set, the accuracy had achieved 70.04%. Our results showed that the built classification model can be applied to determine whether one compound can induce human hepatocytes toxicity, especially the reorganization of compound that can induced hepatic injury was very accurate, which showed that the computer-aided method was a very effective tool and can be applied to predict other properties related to ADME/Tox. Furthermore, the computer-aided method can be used in early-stage of drug discovery, help to provide useful information, and improve the screening speed to some extent.
Keywords/Search Tags:the computer-aided method, quantitative structure- activity relationship (QSAR), pattern recognition, protein folding, matrix metalloproteinase, ADME/Tox properties
PDF Full Text Request
Related items