Font Size: a A A

The Discovery And Correction Of Interprotein Scoring Noises In Glide Docking Scores

Posted on:2013-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WangFull Text:PDF
GTID:1114330371469153Subject:Bioinformatics
Abstract/Summary:
Small molecule drugs are rarely selective enough to interact solely with their designated targets. Unintended "off-target" interactions often lead to side effects, but also serendipitously lead to new therapeutic uses. Identification of the off-targets of a compound is therefore of significant value to the evaluation of its developmental potential. In computational biology, the strategy of "reverse docking" has been introduced to predict the targets of a compound, which uses a compound to virtually screen a library of proteins, reversing the bait and prey in "normal" docking screenings.The present study shows that, in reverse docking, additional optimization of the scoring function may help to improve the target prediction accuracy. We chose Astex Diverse dataset which was a diverse, high-quality dataset containing 85 ligand-protein complexes as our standard example dataset. GlideScore in the "standard precision" mode of Glide could accurately reproduce the crystal binding conformation of 58 complexes in Astex Diverse dataset. But in the reverse docking of those 58 complexes, we found that only 57% of the ligand-protein relationships could be correctly identified. This was likely a result of the constant over-or under-estimation of the GlideScores for specific proteins. In other words, there were interprotein noises in the Glidescores. Using decision tree to classify the successful and unsuccessful reverse docking cases, we found a protein descriptor balance was strongly associated with successful/unsuccessful target predictions. The balance descriptor expresses the ratio of the relative hydrophobic and hydrophilic character of the binding site. Introducing a correction term based on balance improved the target-prediction accuracy by 27%(57-72%). And the new score was named BCGlideScore. It also improved the target-prediction accuracy by 29%(47-60%) on an external test dataset having a similar quality to the Astex Diverse dataset. BCGlideScore had three features associated with the target-prediction improvement:the balance based correction term corrected of the "interpocket" noises, the correction term reduced the correction between the balance descriptor and the BCGlideScore and the correction term might represent a rough estimation of protein entropic changes. The "extra precision" mode (XP) whose conformation search and scoring function are optimized for better correlation between docking score and binding affinity is another mode in Glide for molecular docking. Using a similar analyzing protocol with "standard precision" mode, we found XPEmodelScore showed the highest accuracy in target prediction and our data indicated that there were interprotein noises in the XPEmdoelScores. However, unfortunately, we were unable to identify any ligand or protein property that was strongly associated with the noises and had the potential to correct XPEmodelScores. This was likely a result of our small descriptor pool. With more descriptors to characterize the ligand/protein properties, we might be able to find one suitable property for noise correction. In this regard, interaction-fingerprints may have a big potential to be used for this purpose. In addition, it is for sure that significantly increased correlation between docking score and binding affinity will improve the prediction accuracy in both compound library screening and protein library screening. But our results showed that the XPGlideScores did show better correlation with binding affinity than the standard mode GlideScores. XPGlideScore showed poor performance in target prediction (only 22.0% success) comparing with GlideScore's accuracy of 57%. The above results suggested that slightly improved correlation may not necessarily translate to improved accuracy in protein library screening.We also found that each of the docking scoring objectives (the prediction of the optimal binding conformation, the prediction of the potential protein-binding ligands and the prediction of the potential of targets of a ligand) emphasizes on different aspects of ligand-protein binding. So it may be possible and more effective to develop specialized scoring functions for individual objectives. Theoretically, an omnipurpose scoring function exists, but it always requires intensive computation to estimate. Developing specialized functions for different scoring objectives is a strategy that can reduce the precision requirement for each specialized function. Preparing more comprehensive and representative datasets to train and test more specialized scoring functions might be easier. Therefore, separation of scoring objectives may hold to key to developing simpler yet more effective scoring syste This is the first discussion about the discovery and correction of the interprotein scoring noises in reverse docking. It is our hope that this focused discussion on the Glide scores would invite further efforts to characterize and normalize this type of interprotein noises in all docking scores, so that better target prediction accuracy can be achieved with the strategy of reverse docking. And we will continue to work for developing specialized scoring functions for reverse docking.
Keywords/Search Tags:virtual screening, scoring function, noise correction, structure feature, statistical learning
Related items