Font Size: a A A

Statistical methods for gene set annotation optimization, unsupervised gene set testing and independent gene set filtering

Posted on:2016-09-03Degree:Ph.DType:Dissertation
University:Dartmouth CollegeCandidate:Frost, Hildreth RobertFull Text:PDF
GTID:1474390017476653Subject:Bioinformatics
Abstract/Summary:
Gene set testing has become a critical tool for interpreting the results of high-throughput genomic experiments. Despite the development of robust statistical methods and extensive gene set collections, however, the results from gene set testing are often inaccurate, poorly powered and non-reproducible across experiments. The utility of gene set testing is also limited by the lack of effective techniques for enrichment of unsupervised data. In this dissertation, four novel statistical methods are described that address these challenges: entropy minimization over variable clusters (EMVC), principal component gene set enrichment (PCGSE), spectral gene set enrichment (SGSE) and spectral gene set filtering (SGSF). EMVC optimizes gene set annotations to best match the structure of empirical data. PCGSE and SGSE support unsupervised gene set testing in terms of the principal components of genomic data. SGSF improves gene set testing power via independent filtering of large gene set collections. The effectiveness of each method relative to available approaches is demonstrated using both simulated and real genomic data and gene sets. Implementations of all methods are available as R packages in CRAN.
Keywords/Search Tags:Gene set, Methods, Genomic
Related items