Font Size: a A A

Research On The Integrated Statistical Method Of Genome-wide Association Study Based On Pleiotropy

Posted on:2024-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:G R WangFull Text:PDF
GTID:2530307085499464Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Genome-wide association study(GWAS)is a method to search for disease associated mutations within the human genome The study of GWAS can deepen human understanding of complex diseases,enabling better prevention and treatment of diseases.It also has a positive impact on the diagnosis and treatment of complex diseases In the past decades,human beings have made great achievements in the research of genome-wide association study.Scholars have found that tens of thousands of variation points are significantly associated with complex diseases,but these discovered variation points have very weak ability to explain diseases,and the changes of diseases are still affected by many variation points with low significance associated with diseases To identify gene mutation sites and improve the accuracy of risk prediction,a large amount of sample data is usually required,and the sample size of a single GWAS dataset is usually insufficient to meet this requirement.In recent years,with the development of the Internet,more and more research results on GWAS have been continuously published,which do not include personal information or individual level data,but rather general statistical information on the relationship between traits and variation points Due to the difficulty and high cost of obtaining individual sample data,more and more researchers are exploring how to use free GWAS research results to analyze individual level sample data This type of method was initially based on a linear mixed model,which assumes that there is a linear relationship between gene mutation sites and traits or diseases.In order to introduce general statistics into the analysis of individual sample data,it assumes that there is a certain relationship between the gene mutation sites corresponding to individual sample data and diseases(traits)and the association between gene mutation sites and diseases(traits)implied by general statistics,This includes the hypothesis of homogeneity based on the same population and disease,as well as the hypothesis of heterogeneity based on different populations or related diseases based on genetic pleiotropy Since the two categorical variable used to represent complex diseases may violate some basic assumptions of the linear regression model,some studies have extended the linear mixed model based on the assumption of homogeneity to the logical regression model,and achieved good results However,how to further develop a logistic regression model based on the heterogeneity assumption mentioned above,so that it can be applied to a wider range of scenarios,remains a challenging issue that also inspires the author to conduct more in-depth exploration.This paper proposes a logistic regression model that utilizes gene pleiotropy ensemble to analyze individual data and general statistics On the basis of a logistic regression model that processes individual sample data,this model models the association between the same mutation point and different diseases or traits(traits corresponding to individual level data and various traits or diseases corresponding to general statistical data)by utilizing gene pleiotropy(the same gene affects multiple diseases or traits)information.The general statistical information of related diseases is included in the original logistic regression model Compared with the linear model of genome-wide integrated association analysis based on gene polymorphism,the logistic regression model is more suitable for complex diseases represented by two categorical variable;Compared to the logistic regression ensemble analysis model based on the assumption of homogeneity,this model breaks through the assumption of homogeneity and can integrate and analyze multiple related diseases,with a wider range of applications The model has developed efficient algorithms based on Variational Expectation Maximization Inference Through a certain amount of simulation experiments and analysis of the actual data of Krohn’s disease,this paper finds that the integrated analysis model based on gene diversity proposed in this paper is better than the logical regression model that analyzes individual data alone,and the integrated statistical analysis method of whole genome association under the assumption of homogeneity under the condition of more than one summary statistical data set,There has been varying degrees of improvement in the ability to identify risk variation points and predict disease risk.
Keywords/Search Tags:Genome-wide Association Study, Gene Pleiotropy, Individual Level Sample Data, Summary Statistics, Logical Regression, Variational Inference
PDF Full Text Request
Related items