Font Size: a A A

Large-scale Dependent Multiple Testing Via Extended Hidden Markov Models

Posted on:2020-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:P F WangFull Text:PDF
GTID:1360330620452330Subject:Machine learning and bioinformatics
Abstract/Summary:PDF Full Text Request
The problems of large-scale multiple testing often arise from many scientific ap-plications.For example,in genome-wide association studies(GWAS),one needs to perform tens of thousands of tests to identify the single nucleotide polymorphisms(S-NPs)associated with the complex disease.Other examples include neuroimaging data analysis(Shu et al.,2015),microarray data analysis(Liang and Dan,2010;Liang et al.,2018),spatial data analysis(Sun et al.,2015),etc.To date,a number of multiple testing procedures have been applied in various scientific fields.However,there are still some cumbersome issues left to cope with in large-scale multiple testing.First,the growing availability of "high throughput" data requires us to conduct tens of thousands of tests simultaneously.The traditional control criteria,such as FWER,are overly conservative.Second,the tests in multiple testing often exhibit complex correlations.For instance,in GWAS,since the adjacent genomic loci tend to co-segregate in meiosis,the tests arising from GWAS always have complex depen-dence.Third,the multiple testing procedures ignoring the covariate effect may suffer from loss of testing efficiency.For example,in large-scale two sample inference,em-ploying the sparsity information properly can significantly improve the testing power.To overcome the aforementioned issues,we propose a class of large-scale multi-ple testing procedures based on extended hidden Markov models.Theoretical results show that these novel multiple testing procedures can control false discovery rate(F-DR)at the pre-specified level ? and has the smallest false non-discovery rate(FNR)among all multiple testing procedures with FDR controlled at ?-level.The main work of this paper is divided into five parts.In Chapter 1,we mainly introduce some background knowledge of multiple test-ing and some of the latest multiple testing procedures,including some classical multi-ple test methods based on FWER and FDR.In Chapter 2,we propose a covariate-adjusted multiple testing procedure,called covariate-adjusted local index of significance(CALIS),to account for environmental factors via a factorial hidden Markov model.In Chapter 3,we develop a novel multiple testing procedure based on the Carte-sian hidden markov model,called repLIS,for replicability analysis in GWAS.Both theoretical and simulation results reveal that CALIS are capable of characterizing the local dependence of tests yielded by replicability analysis.In Chapter 4,we present a covariate assisted multiple testing procedure,termed as COALIS,for large-scale two-sample inference under dependence.
Keywords/Search Tags:Multiple testing, GWAS, Hidden Markov models, Replicability analysis, Two-sample inference
PDF Full Text Request
Related items