Font Size: a A A

Genome-Wide Interaction Study Of Single Nucleotide Polymorphisms

Posted on:2014-04-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L ShangFull Text:PDF
GTID:1260330398997840Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the successes of Human Genome Project and1000Genomes Project, thefocus of complex disease research has been shifted to Genome-Wide AssociationStudies (GWAS). GWAS are the case-control studies that examine genetic variants,usually Single Nucleotide Polymorphisms (SNPs), in different indivduals to see if anygenetic variants are associated with complex diseases. SNPs are the most common formof genetic variants in human genome, and usually relate with complex diseases by theirnonlinear interactive effects, namely, epistasis or epistatic interactions. Epistaticinteractions play an important role in interpreting the genetic basis of diseasesusceptibility and disease etiology, and in devising diagnostic tests and useful treatments.Though many works have been done for epistasis detection, the algorithmicdevelopment is still ongoing due to their methodological and computationalcomplexities. This thesis is concerned with the Genome-Wide SNP interaction study,and the main contributions are outlined as below.1. A SNP simulator, EpiSIM (Epistasis SIMulator), is developed to providebenchmark simulation data and new pathogenic models for epistasis detection. EpiSIMcan simulate linkage disequilibrium (LD) patterns, haplotype blocks and minor allelefrequencies well by applying an introduced notion of the average of adjacent LD levels,and by using prior knowledge, such as Mendel’s law of inheritance, Hardy-Weinbergequilibrium, probability theory, and so on. EpiSIM is capable of expanding the range ofpathogenic models that current simulators offer, including epistasis models that displaymarginal effects and those that display no marginal effects. One or more of thesepathogenic models can be embedded simultaneously into a single simulation data,jointly determining the phenotype. EpiSIM is independent of any outside data source inits Markov Chain simulation processes, allows directly targeted data propertyspecification, and provides several flexible options, such as batch processing and outputtransformation. Experiments demonstrate that EpiSIM is a valuable addition and a nicecomplement to existing SNP simulators. EpiSIM will offer enough simulation data forthe follow-up studies.2. Though many epistasis detection methods have been proposed, few studies focuson their comparison, and their relative performance is still unclear. Here, acomprehensive comparison study is carried out through applying related softwarepackages on simulation data. For this purpose, current epistasis detection methods arecategorized according to their search strategies, and five representative methods, including TEAM, BOOST, SNPRuler, AntEpiSeeker and epiMODE, are selected amongthem for comparison. These methods are tested on simulation data with different size,various pathogenic models, and with/without noise. The types of noise include missingdata, genotyping error and phenocopy. Performance is evaluated by detection power,robustness, sensitivity and computational complexity. Experiments show that none ofselected methods is perfect in all scenarios and each has its own merits and limitations;in terms of overall performance, AntEpiSeeker and BOOST are recommended as theefficient and effective methods. This comparison study may provide guidelines forapplying these methods and further clues for epistasis detection.3. Traditional SNP ranking methods, for example, mutual information, chi-squaredtest, and SURF, usually have limited capability in prioritizing interacting SNPs. Here,an interacting-SNP ranking method based on co-information is presented, the core ofwhich is the introduced relevance measure, CII (Co-Information Index). CII value takesinto account the effects of a SNP to the phenotype, including its main effect, and asmany its marginal effects in different SNP combinations involving it as possible. Byanalyzing the computational complexity and the dimension upper threshold of SNPcombinations, an exhaustive strategy is recommended to compute CII value in littlescale data, and a Monte Carlo sampling strategy is recommended to estimate CII valuein large scale data. Experiments demonstrate that the CII based method is capable, andsometimes superior to traditional SNP ranking methods. The work provides a theoreticalbasis of the screening stage for the subsequent design of a multi-stage epistasisdetection method.4. A multi-stage epistasis detection method, EpiMiner (Epistasis Miner), isproposed based on the co-information theory. It is composed of three stages: screening,testing and visualizing. In screening stage, CII based method is employed to visualizeand rank contributions of individual SNPs to the phenotype. The number of top rankingSNPs retained to next stage is specified by a support vector machine classifierautomatically. In testing stage, co-information and co-information based permutationtest are conducted sequentially to search epistatic interactions within the retained SNPs,and the results are then ranked by their p-values. For characterizing broader epistasislandscape, networks are built in visualizing stage by linking pairs of the retained SNPsif their co-information values with respect to the phenotype are stronger than thresholds.Experiments demonstrate that EpiMiner is effective in detecting and visualizingepistatic interactions, and might provide further clues for epistasis detection. 5. Performance of multi-stage epistasis detection methods largely depends on theretained SNPs of the screening stage. To reduce the dependence, an ant colonyoptimization based two-stage epistasis detection method, AntMiner (Ant Miner), isintroduced by incorporating heuristic information into ant-decision rules. In screeningstage, the heuristic information is used to direct ants in the search process for improvingcomputational efficiency and solution accuracy. At the completion of the iterationprocess, both highly suspected SNP combinations and the reduced SNPs with topranking pheromones are selected to next stage. In testing stage, chi-squared test isconducted to search final epistatic interactions within the retained SNPs, and within thehighly suspected SNP combinations. Experiments show that AntMiner is promising forepistasis detection. This study may provide clues on heuristics for further design ofmulti-stage epistasis detection methods.
Keywords/Search Tags:Genome-Wide Association Studies (GWAS), Single Nucleotide, Polymorphism (SNP), SNP Interaction, Pathogenic Model, Complex Disease
PDF Full Text Request
Related items