Font Size: a A A

Devising effective similarity measures and learning algorithms for the study of metazoan gene expression

Posted on:2012-10-09Degree:Ph.DType:Dissertation
University:Princeton UniversityCandidate:Chikina, MariaFull Text:PDF
GTID:1458390008499499Subject:Biology
Abstract/Summary:
With the advent of genome sequencing and modern high-throughput technologies, there is an increasing need for effective and robust methods to turn terabytes of data into biological knowledge. Every new data type and biological problem presents unique analysis challenges; this work focuses on several problems specific to metazoan gene expression.;Methods for analyzing high dimensional data fall into two broad classes, unsupervised and supervised. Unsupervised approaches characterize the components of a dataset without any a priori input while supervised methods rely on existing knowledge to find predictive patterns within datasets. In this work we develop and apply methods that span a range of data analysis techniques. Chapters 2 and 3 concern supervised methods for predicting tissue expression and tissue-specific interactions in C. elegans. We first apply and analyze a well characterized method, Support Vector Machines, to predict tissue-specific expression, and then extend it to two novel methods that allow us to address more complex problems. In the next two chapters we develop two unsupervised methods that address the question of how to define biologically meaningful similarity metrics: we present a powerful yet transparent method to analyze the functional similarity of genes across species and a statistically rigorous method to compare ChIPseq experiments, a relatively new technique that produces complexly structured data.
Keywords/Search Tags:Methods, Data, Similarity, Expression
Related items