Font Size: a A A

Genome Wide Association Study Based On Convolutional Neural Networks

Posted on:2024-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2530307100966139Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Complex diseases seriously threaten human health on both physically and mentally.They are usually the diseases caused by genetic and environmental effects,of which the genetic effect plays an important role.Single nucleotide polymorphism(SNP),known as the third-generation genetic marker,is a DNA sequence polymorphism caused by a single nucleotide on the genome.According to statistics,SNPs account for more than 90% of all known polymorphisms.Most of the differences in drug sensitivity,disease predisposition and phenotypic traits in humans are associated with SNPs,which leads to SNPs are best objects for disease association studies.Genome-wide association study(GWAS)is a strategy to discover genetic variants affecting complex traits by a genome-wide comparison through controlled or correlated analysis using SNPs as the genetic markers.It is meaningful to locate the associated genes,discover the genetic mechanisms and make personalized medicine to complex diseases by GWAS.It is verified that deep neural networks outperform traditional machine learning algorithms in most applications.It can powerfully model complex large-scale data sets.A convolutional neural network(CNN)is one of the representatives of deep learning.Although many great achievements in GWAS have made in recent years,there are data conversion and interpretation that need further study.This study did GWAS on bipolar disorder and coronary heart disease by CNN GWAS models and interpretation by using the gradient-weighted class activation mapping(Grad-CAM)to find SNPs and genes associated with the diseases.The works are as follows:1)A data conversion method based on color images was proposed.A CNN GWAS model on simulated data was constructed,and experiments were conducted to compare the method with existing methods.The experiments verified the feasibility of the method.2)The GWAS models of bipolar disorder and coronary heart disease were constructed respectively to verify the new method on real data.The results show that the models performed well on the both real data,which outperforms the existing methods on the same data.3)The interpretation was implemented by Grad-CAM on the two models.The associated SNPs were screened firstly,and then got the genes involved in the SNPs from NCBI.Finally,the genes were compared to the database GeneCards.The results show that the accuracy of the proposed method is better than the existing method on the same BD data.And,the comparative experiments with a logistic regression association analysis model show that the accuracy of the proposed method on screening of susceptible genes is much higher than that of the logistic regression model,which verify the feasibility of the method further.
Keywords/Search Tags:Genome-wide association study(GWAS), Single nucleotide polymorphism(SNP), Convolutional neural network(CNN), Color-image-based conversion, Gradient-weighted class activation mapping(Grad-CAM)
PDF Full Text Request
Related items