Font Size: a A A

Incorporating Biological Knowledge Prior Into Bayesian Shrinkage Model And Its Application In Genetic Association Studies

Posted on:2020-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X H JiFull Text:PDF
GTID:2370330590997679Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BACKGROUNDWith the rapidly develop of next-generation sequencing,large amount of datas were generated by whole exome sequencing(WES)and whole genome sequencing(WGS),which made it possible for researchers to explore the associated variations with complex diseases/traits.Facing the emergence of high-dimensional genetic datas,although many statistical methods of genetic association studies had been proposed,the pathogenic variations associated with complex diseases were still limited to be found,owing to the lack of test efficacy.Therefore,the statistical methods still faced huge challenges.Recently,researchers suggested that one way to improve the efficacy of genetic association studies was to incorporate the biological information of SNPs(Single Nucleotide Polymorphisms)into statistical models for genetic association analysis.On the other hand,with the development of bio-information technologies,the emergence of a large number of public bio-information function databases provided a variety of biological information priors for genetic variations.Therefore,considering the low efficiency of genetic association studies,this studies proposed to combine the prior scores of multiple biological information into the Bayesian Shrinking Model to estimate the effects of SNPs,so that to improve the discovery rate of complex disease associated SNPs.METHODSIn this study,Bayesian Shrinking Model combined biological information priors was used in genetic association studies.We explored the performance of Bayesian Shrinking Model in three scenarios,including the absence of shrinking(using large scale parameters,equivalent to no shrinking),the fixed shrinking for all SNPs,and the variation-specific shrinking with a prior scoreof biological information.The variation-specific shrinking parameter of each SNP was predicted by SIFT,Regulome DB and CADD(Combined Annotation Dependent Depletion)databases,which then be inserted into the shrinking parameter ranges in a linear manner.All scenarios were applied to the Genetic Analysis Workshop 19(GAW19),which included a continuous phenotype data set and a binary phenotype data set.GAW19 came from the Texas Medical Research Center.Among them,the continuous phenotype data selected in this study was one of 200 simulated phenotype data,which contained 1943 observations.The phenotype was simulated systolic blood pressure(SBP),and it was related with true SBP.The genetic datas selected the chromosome 3,4907 SNPs were obtained after removing SNPs with MAF(Minor Allele Frequency)less than 0.01 and HWE(Hardy-Weinberg Equilibrium)less than 0.05.Four strongest SNPs associated with SBP were selected.And six non-associated SNPs were randomly selected from SNPs.After mapping to the linkage disequilibrium(LD)regions,there were a total of 115 SNPs in two "True block" and six none "True block" were obtained.This study used the average ranking of "True block" and the variation ratio of "Top block" as the model evaluation indexes.The higher the average ranked,the better the model performed.The binary phenotype data selected was a real second-generation sequencing data from GAW19,and the phenotype was true blood pressure(with/without).There were a total of 1851 observations after excluding 92 cases without phenotype,including 427 cases and 1424 controls.Based on the pathway strategy,96 genes and 12251 SNPs in the renin-angiotensin-aldosterone system(RAAS)were selected and mapped to the odd-numbered chromosomes of GAW19.After excluding SNPs whose MAF < 0.01 and HWE < 0.05,finally there were 318 SNPs on 249 LD regions were obtained.RESULTS1.For continuous simulation phenotype data,in the non-shrinking scenario,the best average ranking of "True block" in the Bayesian Shrinking Model and the linear model were same,equaling to 5.00;When all SNPs were given the fixed shrinking,with the change of shrinking parameter,the average ranking of Bayesian Shrinking Model was larger than 5.00.When the variation specific shrinkage defined by the CADD score was combined inBayesian Shrinking Model,the best average ranking for "True block" was4.50,the average ranking for the Linear Model was 5.00.However,when the SIFT scores and the RegulomeDB scores were incorporated,the best average ranking were 5.50 and 14.50,respectively.2.Under the variation specific shrinking scenario,when the average ranking of "True block" was optimal,the upper bound of the shrinking was0.001,the lower limit was from 0.0001 to 0.000001.And in the best shrinking range,the variation ratio of "Top block" reached the maximum of 5557.50.3.In the two category real phenotype data,when the CADD prior score was used as the variation specific shrinkage,the upper limit of the best shrinking parameter determined by the variation ratio of "Top block" was also0.001.In the best shrinking range,the Bayesian Shrinking Model identified 15hypertension-associated SNPs,while Logistic Model identified 11 associated SNPs,and 9 SNPs were overlapped,indicating that the Bayesian Shrinking Model may have greater test efficiency.CONCLUSIONS1.The Bayesian Shrinking Model using variation specific was better than linear model in the ability to detect associated SNPs.When non-shrinkage and the same shrinkage were performed on all SNPs,the Bayesian Shrinking Model did not perform better.2.The Bayesian Shrinking model combined with CADD score showed good performance when applied to genetic association studies.It suggests that in the genetic association study,the CADD database can be used to predict the function of genetic variation,and CADD score was appropriate when it was used as biological information prior.3.The variation ratio of "Top block" was related to the average ranking of "True block".When the variation ratio of "True block" was the largest,the model reached the best shrinkage range.Therefore,in the actual study,the variation ratio was a good indicator for finding the best shrinkage range.
Keywords/Search Tags:bio-information prior, Bayesian Shrinking Model, Generalized Linear Model, genetic association studies
PDF Full Text Request
Related items