Font Size: a A A

Unified sparse regression models for sequence variants association analysis

Posted on:2017-06-30Degree:Ph.DType:Dissertation
University:Tulane University School of Science and EngineeringCandidate:Cao, ShaolongFull Text:PDF
GTID:1458390008975471Subject:Biostatistics
Abstract/Summary:
Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants while maintain a lower false discovery rate than do several commonly used feature selection methods. It can detect rare and common variants with almost equal efficiency.;After further investigation and assessing the oracle property of the USR method, we propose a unified test (uFineMap) for accurately localizing causal loci and a unified test (uHDSet) for identifying high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals. These novel tests are based on scaled sparse linear mixed regressions with Lp (0 linear mixed regressions with Lp (0
Keywords/Search Tags:Sparse, Structure, Cryptic relatedness, Models, USR, Unified, Variants, Data
Related items