Font Size: a A A

Research On Gene-gene Interaction Detection Algorithms For Genome-wide Association Studies

Posted on:2019-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y PengFull Text:PDF
GTID:2370330545977175Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Genome-wide association studies(GWAS)is an effective way to reveal the pathogenicity genes of complex diseases by analyzing the genotypic information on the SNP sites and the phenotypic information of the related diseases.Currently,the dominant mode of GWAS is calculating the statistical relationship between diseases and single SNP sites.However human complex diseases are often caused by gene-gene interactions.A large number of studies have shown that many common human diseases such as mammary cancer,diabetes mellitus and coronary disease are closely related to gene-gene interactions.The statistical methods focused on single SNP sites are not able to detect all the gene-gene interactions.Detection of gene-gene interactions helps to understand gene functions,to discover potential drug targets and to find the genetic mechanism of human complex diseases.With the rapid development of genotyping technology,the phenotype information and the genome-wide genotype information of human individuals are increasing exponentially.The detection of high-dimensional gene-gene interactions is faced with great challenges in computation.Machine learning is a possible way to deal with the problem by simulating human cognitive process using computers.Machine learning algorithms can find high-dimensional nonlinear interactions by learning with a large number of data without assuming a specific gene-gene interaction model in advance.During the last two decades,many machine learning methods have been used to detect gene-gene interactions,and some success has been achieved.However,the genetic heterogeneity,the population stratification and the interactions involving numerous SNP sites are the major factors influencing the performance of machine learning in detecting gene-gene interactions.Since genetic data are high-dimensional and usually contain noise,the current methods of detecting the gene-gene interactions are extremely time-consuming.To deal with the problem,this paper proposes a.new algorithm CP-SVM of detecting gene-gene interactions for genome-wide association studies.CP-SVM combines a model of machine learning and a Cartesian product algorithm to avoid the huge burden of computation on high-dimensional genetic data and the correcting problem of the multiple tests problem of exhaustive searches.We tested CP-SVM with MDR,RF and other machine learning algorithms on the simulated data and the experimental results showed that CP-SVM has better performance of classification and less computing time.We also used CP-SVM to analyze a real disease data of AMD,and the results are consistent with the existing research results,which further verify the validity of CP-SVM.CP-SVM algorithm not only found the SNP combination that the other methods found,but also found other pathogenic SNP combinations,which maybe help to diagnose clinical diseases in the future.
Keywords/Search Tags:Gene-Gene Interaction, Genome-wide Association Analysis, Single Nucleotide Polymorphisms, Machine Learning
PDF Full Text Request
Related items