Font Size: a A A

Development Of Computational Methods On Molecular Similarity

Posted on:2014-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:C Q CaiFull Text:PDF
GTID:1268330425980897Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The process of new drug discovery and development is generally regarded as time-consuming, costly and risky. A typical drug discovery cycle will last about14years, and the corresponding expense varies from800to1000million US dollars. Although the investment for drug discovery has grown significantly during the last decades, the output is never positive proportional to the investment due to the low efficiency and high failure rate. Computer aided drug design, especially the virtual screening technology, is exactly one of the most efficient methods to shorten the drug discovery cycle, reduce both the cost and failure rate. On the basis of molecular similarity, evaluation methods are generally applied to virtual screening technology. This thesis discussed the design, implementation and application of several novel molecular similarity based virtual screening methods. These novel methods had covered various application scopes, including similarity evaluation between small molecules, between small molecules and binding sites, and between binding sites as follows:1) This thesis implemented a rotation invariant molecular surface shape descriptor by using spherical function representation and spherical harmonic projection, and a weighted similarity evaluation method fully considering the specific shape feature in the dataset. For each specific target, a genetic algorithm based search was applied to finding the optimal weights to well separate the actives and decoys in the training set, which was developed into a novel molecular similarity evaluation method named SHeMS. The test results indicated that the method with weights optimization significantly outperformed the one without weights optimization.2) We extended the application of spherical harmonic molecular shape descriptor to perform active molecules identification task based on pattern recognition method. The spherical harmonic molecular shape descriptor was utilized to represent the molecules, and different types of classifiers were constructed to recognize the actives, including:naive Bayesian classifier, decision tree, artificial neural network and support vector machine. In order to cope with the imbalanced dataset problem, some balance strategies were adopted to improve the standard training algorithm. The classification result on a specified dataset indicated that the balanced training algorithm could greatly relieve the negative influence of imbalanced data and achieve reasonable classification performance.3) This thesis designed and implemented a molecular similarity evaluation method based on Gaussian volume and molecular alignment, named SimG. Gaussian function was adopted to represent the molecular volume, and a highly efficient down-hill simplex searching algorithm was utilized to guide the molecular alignment process. This method possesses a wide applicability which could be used for both ligand-based and structure-based virtual screening. Meanwhile, the comparison and analysis of virtual screening result obtained by ligand-based and structure-based strategy indicated that the performance of structure-based virtual screening was significantly related to the structure (especially the closeness) of the binding site adopted as the query template.4) This thesis designed and implemented a binding site similarity evaluation method based on the match of residue position and type. This method utilizes the3D position and type information of the corresponding residue to perform binding sites alignment and evaluate the similarity. The alignment process is guided by a simplex searching algorithm, and a Hungarian algorithm is adopted to identify the residue correspondence. The test results on the corresponding datasets indicated that the proposed method had a great potential to perform binding site similarity based protein classification and screening.
Keywords/Search Tags:Computer-Aided Drug Design, Molecular Similarity, Spherical Harmonics, Gaussian Volume, Pattern Recognition
PDF Full Text Request
Related items