In silico methods for discovery in proteomics, systems biology and drug discovery

Posted on:2011-08-29

Degree:Ph.D

Type:Thesis

University:Princeton University

Candidate:DiMaggio, Peter Anthony, Jr

Full Text:PDF

GTID:2448390002963862

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

A grand problem in biology is understanding the fundamentals of how an organism functions and insights to this problem are provided by several complementary disciplines of research, including genomics, transcriptomics, proteomics, and metabolomics. Several revolutionary high-throughput and/or high-sensitivity platforms have emerged within these disciplines that have enabled researchers to probe large-scale systems, such as the changes in genome expression in response to environmental perturbations or the ability to identify and quantify the peptides and proteins present in an organism. The resulting output from these experimental methodologies are complex and often noisy datasets that require sophisticated and robust data analysis tools. This thesis presents several models and algorithms that address three main areas of open research problems in bioinformatics: (1) mass-spectrometry based proteomics, (2) biclustering in systems biology motivated by DNA microarray data, and (3) the clustering of molecular compound libraries in drug discovery and toxicology.;In the field of proteomics, tandem mass spectrometry coupled with high performance liquid chromatography (HPLC) has emerged as a high-throughput, high sensitivity platform for quantifying and identifying the peptides and proteins present in an organism. A de novo and hybrid methodology based on integer linear optimization have been developed to identify the peptide corresponding to a given tandem mass spectrum. It is demonstrated that the proposed algorithms resulted in superior predictive performance over other existing de novo, hybrid, and database-driven peptide identification methods for several test sets of tandem mass spectrometry data.;In systems biology, microarray experiments are commonly used for simultaneously measuring the transcription levels of thousands of genes. Given this vast amount of dense data, the primary goal is to elucidate genes that are co-regulated by identifying genes that are co-expressed in the experiment based upon similar changes in their expression levels over the various environment conditions. A biclustering method based on the iterative optimal re-ordering of the rows (genes) and columns (conditions) of the gene expression data matrix was developed to analyze these dense data matrices. It is shown that the proposed biclustering method resulted in a superior grouping of related genes than other algorithms for several data sets in systems biology and is also an effective method for sample classification (i.e., it has the ability to partition normal and disease tissue samples).;In drug discovery applications, an element of a data matrix corresponds to a unique molecular compound and the value of this element is some measure of drug efficacy for the compound. These data matrices are very sparse in practice, as the experiments associated with synthesizing and measuring even a fraction of the total compounds are cost-prohibitive. Thus, it is desirable to guide future compound synthesis towards target molecules that have the highest likelihood of being successful drug candidates. To address this problem, a novel clustering algorithm was developed based on integer linear optimization to optimally reorder sparse drug inhibition data matrices. The algorithm is effective in grouping together drug molecules with desired property values. It is demonstrated that integrating the proposed clustering algorithm into an iterative framework is an effective strategy for directing the synthesis of additional compounds towards molecules with high efficacy.

Keywords/Search Tags:

Biology, Drug, Proteomics, Data, Discovery, Method, Compound

PDF Full Text Request

Related items

1	Research On Workflow Matching And Discovery Based On Data Unification For Proteomics
2	Research For Drug Knowledge Oriented Compound-target Network Model
3	The Construction Of Collaborative Drug Discovery Platform And Research On Its Trust Mechanism
4	Querying and Mining Chemical Databases for Drug Discovery
5	Quantitative Proteomics Software Based On Mass Spectrometry
6	Research On Data Mining Method For Drug-related Information Based On Data Integration
7	Mining high-throughput screening data to accelerate drug lead discovery
8	Research On Grid Workflow And Its Application In Drug Discovery
9	Data Analysis For High Content RNA Interference Screening: Pattern Recognition Approaches For Certain Systems Biology Application
10	Statistical learning in drug discovery via clustering and mixtures