Querying and Mining Chemical Databases for Drug Discovery

Posted on:2013-01-06

Degree:Ph.D

Type:Thesis

University:University of California, Santa Barbara

Candidate:Ranu, Sayan

Full Text:PDF

GTID:2458390008469567

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Drug discovery and development has exploded into a multi-billion dollar industry. Unfortunately, despite a steady increase in pharmaceutical research, the number of new drugs discovered has been, at best, flat. The low productivity of current approaches to drug discovery has been ascribed to a number of factors including limited focus to a single protein target and undesirable effects, such as toxicity, that are discovered too late in the discovery process. In this dissertation, I propose strategies to combat the low productivity of current drug-discovery techniques and show that by integrating the principles of statistical significance and diversity into the molecular analysis framework, we can accelerate the drug discovery rate.;In the first part of my thesis, I explore the importance of mining statistically significant patterns from large collections of scientific data and demonstrate their utility in drug discovery. I show that over-represented subgraphs in molecular databases are correlated with biological activity and can be used to learn accurate classification models. Furthermore, statistically significant pharmacophoric patterns can be employed to predict the binding mechanisms between small molecules and protein targets. Finally, I show that mining discriminative subgraphs from protein-protein interaction networks allows us to learn the complex network-encoded logic functions that decide the clinical outcomes of diseases.;In the second part of my thesis, I explore the importance of structural diversity in top-k queries, and develop index structures to answer such queries in a scalable manner. First, I explore the importance of modeling attractive and repulsive dimensions in molecular analysis and demonstrate their utility in going beyond traditional similarity or distance measures. Next, I show that diversity-aware top-k answer sets are informationally denser than traditional top-k answer sets.;Overall, this thesis proposes core indexing and mining algorithms that extend the current state of the art in computer science research. Among the various applications of the developed algorithms, impact in the field of drug discovery acts as the unifying theme binding all of the chapters together. However, these methods are also applicable in other scientific domains such as software bug mining, analysis of communication graphs, social networks, sensor networks, and transportation networks.

Keywords/Search Tags:

Drug discovery, Mining, Explore the importance, Networks

PDF Full Text Request

Related items

1	The Discovery Of Adverse Drug Reactions Based On Comment Mining
2	The Construction Of Collaborative Drug Discovery Platform And Research On Its Trust Mechanism
3	Mining high-throughput screening data to accelerate drug lead discovery
4	In silico methods for discovery in proteomics, systems biology and drug discovery
5	Research On Grid Workflow And Its Application In Drug Discovery
6	Statistical learning in drug discovery via clustering and mixtures
7	Data Mining/Machine Learning Techniques for Drug Discovery: Computational and Experimental Pipeline Developmen
8	Research On Mining Community From Emails
9	A Method Of Community Discovery In Social Networks Based On Local Node Importance
10	Research On Data Mining Method For Drug-related Information Based On Data Integration