Research On The Integrated Statistical Method Of Genome-wide Association Study Based On Pleiotropy

Posted on:2024-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:G R Wang

Full Text:PDF

GTID:2530307085499464

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Genome-wide association study(GWAS)is a method to search for disease associated mutations within the human genome The study of GWAS can deepen human understanding of complex diseases,enabling better prevention and treatment of diseases.It also has a positive impact on the diagnosis and treatment of complex diseases In the past decades,human beings have made great achievements in the research of genome-wide association study.Scholars have found that tens of thousands of variation points are significantly associated with complex diseases,but these discovered variation points have very weak ability to explain diseases,and the changes of diseases are still affected by many variation points with low significance associated with diseases To identify gene mutation sites and improve the accuracy of risk prediction,a large amount of sample data is usually required,and the sample size of a single GWAS dataset is usually insufficient to meet this requirement.In recent years,with the development of the Internet,more and more research results on GWAS have been continuously published,which do not include personal information or individual level data,but rather general statistical information on the relationship between traits and variation points Due to the difficulty and high cost of obtaining individual sample data,more and more researchers are exploring how to use free GWAS research results to analyze individual level sample data This type of method was initially based on a linear mixed model,which assumes that there is a linear relationship between gene mutation sites and traits or diseases.In order to introduce general statistics into the analysis of individual sample data,it assumes that there is a certain relationship between the gene mutation sites corresponding to individual sample data and diseases(traits)and the association between gene mutation sites and diseases(traits)implied by general statistics,This includes the hypothesis of homogeneity based on the same population and disease,as well as the hypothesis of heterogeneity based on different populations or related diseases based on genetic pleiotropy Since the two categorical variable used to represent complex diseases may violate some basic assumptions of the linear regression model,some studies have extended the linear mixed model based on the assumption of homogeneity to the logical regression model,and achieved good results However,how to further develop a logistic regression model based on the heterogeneity assumption mentioned above,so that it can be applied to a wider range of scenarios,remains a challenging issue that also inspires the author to conduct more in-depth exploration.This paper proposes a logistic regression model that utilizes gene pleiotropy ensemble to analyze individual data and general statistics On the basis of a logistic regression model that processes individual sample data,this model models the association between the same mutation point and different diseases or traits(traits corresponding to individual level data and various traits or diseases corresponding to general statistical data)by utilizing gene pleiotropy(the same gene affects multiple diseases or traits)information.The general statistical information of related diseases is included in the original logistic regression model Compared with the linear model of genome-wide integrated association analysis based on gene polymorphism,the logistic regression model is more suitable for complex diseases represented by two categorical variable;Compared to the logistic regression ensemble analysis model based on the assumption of homogeneity,this model breaks through the assumption of homogeneity and can integrate and analyze multiple related diseases,with a wider range of applications The model has developed efficient algorithms based on Variational Expectation Maximization Inference Through a certain amount of simulation experiments and analysis of the actual data of Krohn’s disease,this paper finds that the integrated analysis model based on gene diversity proposed in this paper is better than the logical regression model that analyzes individual data alone,and the integrated statistical analysis method of whole genome association under the assumption of homogeneity under the condition of more than one summary statistical data set,There has been varying degrees of improvement in the ability to identify risk variation points and predict disease risk.

Keywords/Search Tags:

Genome-wide Association Study, Gene Pleiotropy, Individual Level Sample Data, Summary Statistics, Logical Regression, Variational Inference

PDF Full Text Request

Related items

1	Comparisons Of Linkage Disequilibrium Matrix Estimation Methods In Summary Data-Based Transcriptome-Wide Association Analysis
2	Research On Robust Tests In Genes Association Studies
3	On The Equivalence Of Using Summary Statistics Versus Individual Level Data In Meta-analysis
4	Construction And Analysis Of Complex Networks For Genome-wide Association Data
5	Research On Gene-gene Interaction Detection Algorithms For Genome-wide Association Studies
6	The Research Of Rare Variants Based On The Genome-wide Association Study
7	Research On Genome-Wide Association Analysis Algorithms Based On MDR
8	Application Of Tag SNP-set Analytical Method In Genome Wide Association Study
9	Study On Genome-wide Association Analysis And Comparable Performance Using Decision-tree-based Methods
10	An Optimal Principal Component Regression For Genomic Control In Genome-wide Association Analysis