Font Size: a A A

Computational approaches for biological data analysis

Posted on:2011-12-24Degree:Ph.DType:Thesis
University:Tufts UniversityCandidate:Wei, XintaoFull Text:PDF
GTID:2448390002465111Subject:Biology
Abstract/Summary:
Due to the development of molecular biological technologies and techniques, more and more large-scale biological data sets are becoming available. Identifying biologically-useful information from these data sets has become an important challenge. Computational biology aims to address this challenge. In this thesis, we introduce several computational techniques for analyzing different types of biological data to study human disease.;As the number of sequenced genomes has grown, we have become increasingly aware of the impact of horizontal gene transfer on our understanding of genome evolution. We introduce a new method for detecting horizontal transfer that incorporates the distances typically used by phylogeny-based methods, rather than the trees themselves. We demonstrate that the distance method is scalable and that it performs well precisely in cases where phylogenetic approaches struggle. We conclude that a distance-based approach may be a valuable addition to the set of tools currently available for identifying horizontal gene transfer.;Next, we look at HIV-human protein-protein interaction (PPI) data to better characterize PPIs between viruses and their hosts. We demonstrate that HIV proteins tend to interact with many human proteins, viral proteins tend to interact with human hubs, and the hub proteins with which they interact tend to be preferentially conserved. Furthermore, we design two new approaches to predict the PPIs between viruses and hosts and test these methods on HIV integrase and its human partners.;We also design a noise-tolerant method to detect gene sets by biclustering gene expression data. The approach is evaluated on a human metabolic gene expression dataset and a mouse developmental dataset. We show that our approach has strong power to detect functionally-related gene sets for use in the analysis of novel gene expression data.;Finally, using the chemosensitivity values of the NCI-60 panel of cancer cell lines and their gene expression profiles, we design an approach to predict the chemosensitivity values of new samples to anti-cancer drugs. Our approach relies on attempting to predict the actual sensitivity values rather than discrete sensitivity classes. Our analysis shows that our method performs significantly better than chance, and works well on an identifiable subset of compounds. We then test and validate this hypothesis by predicting the chemosensitivity of primary tumors to the topoisomerase I inhibitor topotecan, using a model trained only on the NCI-60 cell lines.
Keywords/Search Tags:Biological data, Approach, Gene expression, Computational, Sets
Related items