Font Size: a A A

Single Nucleotide Polymorphisms Features Associated With Genetic Diseases And Traits

Posted on:2019-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y C YaoFull Text:PDF
GTID:2404330545497957Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Single Nucleotide Polymorphisms(SNPs)are DNA sequence polymorphisms caused by single nucleotides on the chromosomal genome.The research and development of SNP association is of great significance for locating pathogenic genes and discovering the genetic mechanism of complex diseases.Aimed at analyzing and tackling the high-dimensional and small-sample characteristics of SNP data,this paper established a detection model of pathogenic sites,an association analysis model of genetic diseases and traits and loci,and effective dimension reduction for massive data,taking into full consideration of the strong correlation between SNPs,between SNP and genetic diseases,and between traits and loci.1)This paper used the idea of a subtyping value to create a simple numerical code.Base pairs at each locus are recoded,and the genotype at each locus is mapped to the numbers 0,1,and 2.Then the least allele control and Hardy-Weinberg equilibrium are used to establish a quality control model to eliminate the factors that do not meet the gene balance and sift out the SNP loci which does not conform to gene balance.2)A single loci is tested for relevance.Methodology of Chi-square test and logistic regression are used to explore the selection of pathogenic SNPs.Through these two methods,ten SNP sites with higher pathogenicity were screened out,and the false positive results were controlled by the Bonferroni correction method.This paper also uses a random forest algorithm to analyze genetic data sets.Random forest algorithms can use the importance of variables to calculate the importance of the variables for each SNP.The higher the score,the higher the degree of relevance of the site to the disease and the greater the likelihood of a pathogenic SNP.The resulting disease site number is rs2273298.3)The relationship between multiple genetic loci and multiple traits was discussed.This paper first analyzes the correlation between these traits and finds that they are highly relevant.Subsequently,the metaCCA algorithm is used in this paper.Finally,the site number that is highly related to these traits is rs12746773.
Keywords/Search Tags:GWAS, Random Forest, Classification
PDF Full Text Request
Related items