Font Size: a A A

Genetic Association Analysis Of Uncertain Genotype Data Based On Generalized Additive Model

Posted on:2022-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2480306485989789Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the study of human genetic disease history,can draw a genome-wide association analysis(GWAS)is a common analysis method,(GWAS)method is commonly by identifying the molecular markers of high density,for tens of millions of single nucleotide polymorphism(SNP)markers,by investigating the SNP again with a specific disease to the relationship between gene to treat diseases.However,there are also genotypes that are not decoded correctly,and only genotype probability can be obtained.Due to the large number of gene loci.We in genotype uncertain circumstances,using R language package loci randomly generated,get gene loci under uncertain,for gene loci lost part,using the "dose" the missing value of genetic loci data interpolation,genotype uncertain,Doesn't give a genotypes under uncertainty loci gene type,use "the most likely genotype" here get relative to determine the genotypes of probability,the previous traditional linear model in the field of parameter estimation,the genotype for illness and not sick under uncertain and gene loci,the correlation analysis between the This paper innovates the use of B-spline method and generalized linear model:Logistic regression models,namely the uncertainty of gene B spline sample data,and get the phenotypic value Y is discrete,built on b-spline treated genotype of uncertain genotype data and parameters of the logistic regression models,generalized linear additive models explain variables gene loci and interpreted phenotypic Y is sick and not sick nonlinear relation,in the field of the parameters,It avoids the strong constraint condition of linear model under parameter estimation and can be applied to the real model in a more general sense.After B spline estimation,handled after loci,continue to use SCAD for variable selection,choose to return to the sick and not closely associated loci,according to the assumed model and analysis between the simulation model and real data,which can be used to detect sick and not sick and are associated with the degree of correlation between genetic loci,It has practical research significance.
Keywords/Search Tags:penalty variable selection, genotype uncertainty, B-spline, SCAD, logistic regression
PDF Full Text Request
Related items