Font Size: a A A

Estimating The Proportion Of True Null Hypotheses And Its Applications For Multiple Testing

Posted on:2018-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:S S TianFull Text:PDF
GTID:2310330536960088Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the information society,the emergence of massive data,as an important theory of high dimensional data analysis,multiple testing has attracted the attention of many statisticians.Multiple testing has a wide range of applications,such as bioinformatics,medicine and genomics.This thesis focuses on the estimation and application of the proportion of the hypothesis for multiple testing:This thesis mainly introduces the background,significance and research status of the multiple testing.Through the analysis and research on the estimation of the proportion of the hypothesis,the research emphasis of this paper is determined.Then introduces some basic theory of multiple testing,and points out that the most important in multiple testing is to control the type I error,then gives several error metrics,pointed out the importance of FWER and FDR,and gives the definition and properties of the P value,proposes using the P value to test the hypotheses.In the case that the hypothesis test is independent and dependent,various testing methods for FDR are introduced,and also gives the two stage FDR control method.In the study of the controlling method of the false discovery rate,we find the importance of estimating the proportion of the true null hypotheses,and gives its significance.Secondly this thesis introduces several existing estimators,we propose a new estimation method based on the analysis of existing methods,which mainly apply the three spline method to the mean method proposed by Jiang and Doerge(2008),and several methods are compared through simulation studies with the uniform datasets,non-uniform datasets and Gene expression simulation data with hidden dependence structures,we found that the new estimation method has a good effect.Besides,we propose parametric mixture model to estimate the proportion of true null hypotheses,we mainly give four algorithms for the mixed normal distribution model: moment estimation(MM),EM algorithm,the combination of k-means clustering and EM algorithm(KMEM)and The combination of modified k-means clustering and EM algorithm(MKMEM),compares them under different simulation conditions,and gives the estimation error of four algorithms.Finally,this thesis conduct the simulation research based on microarray data,gives three kinds of datasets: breast cancer gene expression data,tumor cell data and GSE1743 renal transplantation data,then the estimated values of different methods in different datasets are given,which indicate the feasibility of our method.
Keywords/Search Tags:multiple testing, the proportion of true null hypotheses, normal mixture model, FDR, microarray
PDF Full Text Request
Related items