Font Size: a A A

Multiple Loci Methods For Estimating The Degree Of The Skewness Of X Chromosome Inactivation

Posted on:2024-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:M K LiFull Text:PDF
GTID:2530306926488494Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Background:Skewed X chromosome inactivation(XCI-S)has been reported to be associated with some genetic diseases.Except for testing the association between the X chromosomal alleles and the traits,it is also important to measure the degree of the skewness of XCI(denoted as y).For a locus on the X chromosome which undergoes XCI-S,three genotypes(dd,Ddand DD,where D is the minor allele)for females can be coded as 0,y and 2,respectively,depending on the expression level of the minor allele D.If the XCI is skewed towards the minor allele D completely,the expression level of Dd is the same as that of dd,and then γ=0;If the XCI is skewed towards the major allele d completely,the expression level of Dd is the same as that of DD,and then γ=2.Several methods have been proposed to estimate y for a single locus.However,these methods still have some shortcomings as follows.First,the existing Frequentist methods do not consider the constraint condition of y E[0,2]in the process of constructing the statistics,and only truncate the results to the range of[0,2],which may obtain extreme point estimates(0 or 2),empty sets and noninformative intervals([0,2]).Second,although the existing Bayesian methods can avoid extreme point estimates,empty sets and noninformative intervals,like the methods mentioned above,these methods were constructed only based on a single locus and are not applicable to the estimation of γ for genes or genetic regions which contain multiple loci.Meanwhile,when the locus is a rare variant,these methods usually have poor performance.Objective:Genes as research objects,this study aims to propose the methods,which can be used to estimate γ from multipe loci perspective and consider the constraint condition of γ∈[0,2]in the estimation process.Methods:In this study,we propose four point estimates and corresponding interval estimation methods to estimate γ of genes.We borrow the idea of the burden test,aggregate all the variants in a gene into a burden variable by selecting the appropriate weights,and then estimate y based on the burden variable.Then,based on the ratio of two regression coefficients,we propose the first point estimate of γ(denoted as γGF),and on this basis,we respectively correct the denominator and the numerator of the ratio,and propose the penalized point estimate of γ(denoted as γGPF).The Fieller’s method and the penalized Fieller’s(PF)method are used respectively to obtain the corresponding confidence intervals.Finally,we consider the constraint condition of γ∈[0,2]and propose the Bayesian methods to derive the point estimates and the credible intervals of γ,where a truncated normal prior and a uniform prior are respectively used.These two methods are denoted as GBN(Bayesian method with truncated normal prior distribution)and GBU(Bayesian method with uniform prior distribution),and the corresponding point estimates got by these two methods are denoted γGBN and γGBU,respectively.Results:The simulation results show that γGPF and γGF may be extreme point estimates,and the proportion of γGPF being extreme point estimates is smaller than that of γGF The Bayesian methods can completely avoid the occurrence of extreme point estimates,and have smaller mean squared error in all the simulated cases,in which the mean squared error of γGBN is the smallest.As for the interval estimation,the Fieller’s method may get discontinuous intervals,empty sets and noninformative intervals.Although the penalized Fieller’s method can avoid discontinuous intervals,it is still possible to obtain empty sets and noninformative intervals.However,the Bayesian methods can completely avoid the appearance of these three types of intervals.The coverage probability of the GBN,the GBU and the PF method can be generally controlled around 95%,and the coverage probability of the Fieller’s method can be controlled around 95%in all the simulated cases.Meanwhile,the mean,the median,the standard deviation and the interquartile range of the widths of the credible intervals derived by the GBN are the smallest in most of the situations.In the practical application of the Minnesota Center for Twin and Family Research data,a gene(TMEM47,P value=2.32 × 10-6)is significantly associated with the alcohol dependence composite score.Applying the proposed methods to this gene,we found that the credible interval or the confident interval derived by the GBN,the GBU,the penalized Fieller’s method and Fieller’s method are(0.0023,1.2380),(0.0337,1.3083),(0.0562,1.2410)and(0.0557,1.3896),respectively.All of them contain 1,which indicates that TMEM47 undergoes random X chromosome inactivation or escape from X chromosome inactivation on the alcohol dependence composite score.Conclusion:The point estimates and the corresponding interval estimation methods proposed in our study can be used to estimate y from the perpective of multiple loci.Meanwhile,the GBN performs the best in both the point estimation and the interval estimation,and we recommend using the GBN to estimate γ in practical applications.
Keywords/Search Tags:Gene, Skewed X chromosome inactivation, Fieller’s method, Penalized Fieller’s method, Bayesian method, Minnesota Center for Twin and Family Research data
PDF Full Text Request
Related items