In silico methods for discovery in proteomics, systems biology and drug discovery | | Posted on:2011-08-29 | Degree:Ph.D | Type:Thesis | | University:Princeton University | Candidate:DiMaggio, Peter Anthony, Jr | Full Text:PDF | | GTID:2448390002963862 | Subject:Engineering | | Abstract/Summary: | PDF Full Text Request | | A grand problem in biology is understanding the fundamentals of how an organism functions and insights to this problem are provided by several complementary disciplines of research, including genomics, transcriptomics, proteomics, and metabolomics. Several revolutionary high-throughput and/or high-sensitivity platforms have emerged within these disciplines that have enabled researchers to probe large-scale systems, such as the changes in genome expression in response to environmental perturbations or the ability to identify and quantify the peptides and proteins present in an organism. The resulting output from these experimental methodologies are complex and often noisy datasets that require sophisticated and robust data analysis tools. This thesis presents several models and algorithms that address three main areas of open research problems in bioinformatics: (1) mass-spectrometry based proteomics, (2) biclustering in systems biology motivated by DNA microarray data, and (3) the clustering of molecular compound libraries in drug discovery and toxicology.;In the field of proteomics, tandem mass spectrometry coupled with high performance liquid chromatography (HPLC) has emerged as a high-throughput, high sensitivity platform for quantifying and identifying the peptides and proteins present in an organism. A de novo and hybrid methodology based on integer linear optimization have been developed to identify the peptide corresponding to a given tandem mass spectrum. It is demonstrated that the proposed algorithms resulted in superior predictive performance over other existing de novo, hybrid, and database-driven peptide identification methods for several test sets of tandem mass spectrometry data.;In systems biology, microarray experiments are commonly used for simultaneously measuring the transcription levels of thousands of genes. Given this vast amount of dense data, the primary goal is to elucidate genes that are co-regulated by identifying genes that are co-expressed in the experiment based upon similar changes in their expression levels over the various environment conditions. A biclustering method based on the iterative optimal re-ordering of the rows (genes) and columns (conditions) of the gene expression data matrix was developed to analyze these dense data matrices. It is shown that the proposed biclustering method resulted in a superior grouping of related genes than other algorithms for several data sets in systems biology and is also an effective method for sample classification (i.e., it has the ability to partition normal and disease tissue samples).;In drug discovery applications, an element of a data matrix corresponds to a unique molecular compound and the value of this element is some measure of drug efficacy for the compound. These data matrices are very sparse in practice, as the experiments associated with synthesizing and measuring even a fraction of the total compounds are cost-prohibitive. Thus, it is desirable to guide future compound synthesis towards target molecules that have the highest likelihood of being successful drug candidates. To address this problem, a novel clustering algorithm was developed based on integer linear optimization to optimally reorder sparse drug inhibition data matrices. The algorithm is effective in grouping together drug molecules with desired property values. It is demonstrated that integrating the proposed clustering algorithm into an iterative framework is an effective strategy for directing the synthesis of additional compounds towards molecules with high efficacy. | | Keywords/Search Tags: | Biology, Drug, Proteomics, Data, Discovery, Method, Compound | PDF Full Text Request | Related items |
| |
|