Font Size: a A A

Interpretable set analysis for high-dimensional data

Posted on:2012-07-03Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Boca, Simina MFull Text:PDF
GTID:1468390011962957Subject:Biology
Abstract/Summary:
A ubiquitous problem in high-dimensional analysis is the identification of predefined sets that are enriched for features showing an association of interest. In this situation, inference is performed on sets, not individual features. We develop new methods for set analysis which answer relevant questions of interest and improve interpretability of results, with a focus on gene-set analysis. We show that our methods are more transparent, interpretable, and suited to the scientific questions than traditional methods.;We introduce a method which is designed for somatic mutation studies of cancer, which initially scores each gene-set at the patient rather than the gene level. We apply it to several genome-wide mutation studies of human cancers.;We also present a decision-theoretic approach which focuses on estimating the fraction of non-null features in a set and is easier to interpret than p-value methods. We apply it to two datasets from genomics and one dataset from brain imaging.;Finally, we discuss several possible methods for clustering genes based on their gene-set annotations. This helps provide exploratory insights into how genes work together by pointing out potential problems with the existing annotations and create non-overlapping set annotations to use in statistical gene-set analyses.
Keywords/Search Tags:Gene-set
Related items