Font Size: a A A

Epistatic Models Constructing And Optimization Of Learning In Genome-Wide Association Studies

Posted on:2014-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:S S LiFull Text:PDF
GTID:2230330392460859Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The main goal of our geneticists is to find the susceptible mutationsites and related disease causing mechanisms then utilize such kinds ofknowledge to conduct disease preventing work to make contributions toour humans’ health. Gene-gene interactions has been recognized as amajor component of human heredity and multi-locus models’ learningcan help us a lot to know the essence of common human diseases.However the seeking of critical mutation loci from huge amounts ofbiological nucleotide data items has been proved to be a very difficultthing. How to build a model with appropriate evaluation criterions torepresent the correlation between susceptible mutation subset andaccording disease and how to learn it from an almost astronomicalnumbers of combinational gene-gene interaction models space asaccurately and fast as possible are still two big challenges laying in frontof us. In this study, we construct a Decision model to represent therelationship between mutations subset and the disease then we use threeefficient evaluation scores to weigh the correlation within it. In additionwe provide two variants of ant colony optimization strategies to searchfor gene-gene interactions heuristically. The work and novelties in thisdissertation include:Propose a generalized Decision model to show the essentialprinciple how a classification model is constructed to search forsusceptible SNPs subset by looking for their genotypecombinations which can distinguish samples with differentdisease states best, then we use three evaluation scores asConditional Entropy, Gini coefficient and Bayesian score toweigh the model’ ability to identify critical mutation sites. We compared these score’s performance on a wide range ofsimulated datasets and a real high dimensional GWA dataset, lateonset Alzheimer’s disease dataset. It concluded that ConditionalEntropy and Gini coefficient can predict the deleterious mutationsites faster than Bayesian score but have a weaker detectingability. When they were used on detecting weak-associatedgenetic models, Conditional Entropy and Gini coefficient show abetter performance on both detecting abilities and computationalefficiency. All of their detecting abilities decreased when theywere dealing with unbalanced datasets. Results of real GWAdataset shows that Bayesian score and Conditional Entropy canhave a good ability to detect functional loci found in previousstudies and it also proved that our constructing Decision modelwith appropriate evaluation criterions can be applied to detectdeleterious non-synonymous mutations subset on real GWAdatasets successfully.Propose two variants of ant colony optimization methods tosearch for susceptible multi-locus mutation subset stochastically.First strategy searches for genetic models without confining theorder of genetic models containing many loci, so it has a bigflexibility. We proposed a stopping rule to speed its convergenceand then researched on how to set related parameters to balanceaccuracy rates and computational efficiency of this strategy.Second strategy acts as a filter method to select some highpriority loci first and then does an exhaustive search on highorder gene-gene interactions within the selected loci subset.Experiments on both simulated datasets and real GWA datasetshowed our method can perform efficiently while keeping theaccurate rates of functional loci predicting.
Keywords/Search Tags:Gene-gene interactions, GWAS, epistasis, SNP, ACO, Evaluation criterion, Classification learning
PDF Full Text Request
Related items