Font Size: a A A

Quantitative, predictive modeling of biochemical networks: A machine learning and information-theoretic approach

Posted on:2009-11-30Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Ziv, EtayFull Text:PDF
GTID:1448390002491407Subject:Engineering
Abstract/Summary:
In the post-genomic era, developments in high-throughput technologies and statistical inference have automated the task of enumerating all of the cellular components and their interactions, generating an unprecedented volume of data about biochemical networks. While providing more detail than just genomic sequences, the interaction data alone does not generate new biological insight. Quantitative analysis of biochemical networks can be used to elucidate mechanisms of normal and pathological cell function, make experimental predictions, and provide novel biological insights. In this dissertation we apply tools from machine learning and information theory to analyze such networks at multiple scales, building quantitative and predictive models to resolve three open questions in biology.; At the scale of small networks, we investigate the input-output relation of stochastic transduction networks and find that all such networks can transmit input information optimally. Our analysis poses a potential solution for the "cross-talk" dilemma whereby multiple distinct inputs converging onto a single pathway reliably trigger appropriate output responses. Networks can robustly transmit information in the presence of large (tenfold) parameter perturbations, and simultaneously adapt their behavior without significantly compromising transmission fidelity. We demonstrate that some features of the network can impart an advantage to the network for noise regulation and therefore information transmission. Predictions are validated using a database of known transcription factors.; Using counts of subgraphs appearing in known protein-protein interaction networks, we exploit a recently developed machine learning algorithm to discriminate between proposed network evolution mechanisms. We then classify the protein protein interaction networks of D. melanogaster and S. cerevisaie as one of these mechanisms. Our results predict that the protein networks most closely resemble a particular duplication-mutation mechanism. The predictions are confident (with high probability), consistent (over species and feature spaces), and robust (with respect to errors in the interaction data). In addition, these results provide statistical validation of recent evidence from comparative genomic sequence data demonstrating the importance of functional preservation and paralog self-linking.; Finally, at the scale of the global organization of the network, we adapt a principled, information-theoretic, unsupervised learning algorithm to identify modules in a network. Testing the approach on synthetic networks, we find that the algorithm compares favorably to the standard module discovery technique in terms of identifying the correct modules and determining the true number of modules. We present the first quantitative, parameter-free definition of network modularitya dimensionless number between 0 and 1, which we then use to demonstrate that the E. coli genetic regulatory network is modular. Applications to a social network of physics collaborators and the E. coli network are presented.
Keywords/Search Tags:Network, Machine learning, Quantitative, Information
Related items