Font Size: a A A

Statistical Theories And Methods For Several Problems Of Complex Genetic Data

Posted on:2022-02-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J B MengFull Text:PDF
GTID:1480306491959749Subject:Machine learning and bioinformatics
Abstract/Summary:PDF Full Text Request
In recent years,statistical methods have been widely used in the research of complex genetic data.It can exploit the meaningful genetic information and discover the potential genetic rules,so that one can further reveal the genetic mys-teries.With the rapid development of biotechnology,we have massive amounts of complex genetic data,which has the characteristics of high dimensionality,especially when the interactions are considered,the dimensionality of variables increases exponentially.This makes the traditional statistical methods face more challenges.In this thesis,we consider the interaction and propose a new Bayesian variable selection method based on spike and slab priors.We use the Skinny Gibbs algorithm to approximate the high-dimensional non-sparse covariance ma-trix with a sparse matrix,thus avoiding the expensive calculation in the stan-dard Gibbs algorithm,and prove that the method provides a stable posterior distribution and the strong compatibility of model selection.A large number of simulation studies show that our method has a good variable selection effect under different genetic mechanism settings,and can be successfully applied to the genetic analysis of the actual data set of yeast cell cycle gene expression.On the other hand,in the rare mutation association study of complex genetic data,due to the extremely low mutation rate of rare mutations,most of the existing methods give low test power,which needs more effective test methods.We pro-pose a rare variation association analysis method based on sparse contingency tables,provided the constructing Markov bases under the framework of algebraic statistics as well as the sampling on fibers to calculate the p-value of the exact test.The simulation studies show that the power of the proposed method is better than other methods,especially when there is heterogeneity between ran-dom variables,the effect of the test method will be better.At the same time,our method was successfully applied to the genetic association study of coronary heart disease and hypertension with the data from the Wellcome Trust Case Con-trol Consortium(WTCCC).This thesis provides an effective statistical analysis method for the study of complex genetic data,and gives a statistical basis.
Keywords/Search Tags:complex genetic data, interaction, Bayesian variable selection, rare variants, contingency tables, algebraic statistics
PDF Full Text Request
Related items