Font Size: a A A

Research On Genome-Wide High-Order Epistasis Identification Methods

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:X CaoFull Text:PDF
GTID:2370330611464268Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In genome-wide association study(GWAS),utilizing single nucleotide polymorphism(SNP)markers to explore human complex diseases has become one of the hot topics in molecular genetics.Traditional GWAS only focus on the association analysis between single risk SNP and the disease.It is widely acknowledged that complex diseases are often affected by the interaction between multiple genes,or the interaction between genes and environment.Therefore,SNP-SNP interactions(epistasis)research on human complex diseases will promote further understanding of the theoretical estimated heritability of diseases,and alleviate the “missing heritability” brought by traditional GWAS.Currently,a variety of epistasis detection methods have been proposed.However,most of them are mainly targeted for two-order epistasis detection.The biggest challenge faced by high-order epistasis detection is the heavy computational burden,caused by high dimensionality of genome-wide dataset.Screening SNPs makes it possible to detect higher-order epistasis on genome-wide datasets.The key is to define appropriate screening strategies.This dissertation conducted an in-depth research on genome-wide high-order epistasis identification methods,and the main contributions are outlined as follows:(1)High-order SNP-SNP interactions detection based on significant statistical pattern and fast permutation test(HiSSI).Bonferroni correction in multiple hypothesis test for GWAS is often overly conservative,resulting in increaseing the probability of producing false positive.This paper proposes a screening strategy that combines significant statistical pattern and fast permutate test.Family wise error rate(FWER)is introduced to control false positives,so as to screen significant two-locus combinations as candidate set.Based on the number of combinations in candidate set,two alternative search methods are employed: exhaustive search and heuristic search,which guarantee HiSSI to detect more epistasis within the effective time.Simulation studies demonstrate that HiSSI has high performance on high-order interactions detection.Study on a real dataset demonstrates the utility of Hi SSI for high-order epistasis identification on GWAS dataset.(2)High-order epistasis identification based on clustering and mutual information(ClusterMI).Although HiSSI introduces a fast permutation test to improve its efficiency,it exhaustively analyse all two-locus combinations based on the whole search space,which is time-consuming.Most screening strategies based on two-locus combinations also suffer from the same problem.This paper proposes a screening strategy that combines clustering and mutual information.ClusterMI employs clustering to divide SNPs into multiple clusters,in which mutual information is used to measure the association between SNPs;based on each cluster,conditional mutual information is utilized to screen two-locus combinations significantly associated with the disease to form candidate set.The introduction of clustering reduces the search space of two-locus combinations,and improves the computational efficiency.Based on the candidate set,ClusterMI utilizes chi-square test or ant colony optimization(ACO)algorithm to detect high-order epistasis.Experiments on various simulation datasets indicate that the effectiveness and efficiency of ClusterMI outperforms other related methods.On real datasets,ClusterMI detects some significant high-order epistasis,which are hard to be detected by other related methods.(3)High-order epistasis identification based on dual screening and multifactor dimensionality reduction(DualWMDR).Both Hi SSI and ClusterMI screen significant candidate set based on the interaction effect between two SNPs,and most of existing SNP screening strategies also employ the same idea.In practice,the disease is caused by both single SNP and SNP-SNP interactions.Comprehensively considering single-locus effect and interaction effect of SNPs,this paper proposes a dual screening strategy.In the first screening,DualWMDR combines clustering and part muual information(PMI)to exclude noisy SNPs,and divides the remaining SNPs into multiple clusters.In the second screening,both single-locus effect and interaction effect of SNPs are considered to select the optimal SNPs in each cluster.DualWMDR utilizes weighted multifactor dimensionality reduction(WMDR)to detect epistasis based on optiml SNPs.Simulation studies in different scenarios show that DualWMDR has better performance than other related representive methods.Studies on real datasets reveal the effectiveness of DualWMDR in identifying high-order epistasis on the genome-wide dataset.(4)High-order epistasis identification based on ensemble multi-type detectors(EnSSI).HiSSI,ClusterMI,DualWMDR and existing identification methods all utilize individual(or the same type)detectors to screen significant candidate set and detect epistasis,leading to unfavorable results due to detector bias and disease complexities.Both ClusterMI and DualWMDR utilze clustering to partition the search space to reduce computational burden,but increase the risk of missing significant combinations and reduce identification performance.In order to effectively alleviate these problems,this paper proposes an ensemble strategy that combines multi-type detectors to screening SNPs and detect epistasis.In the ensemble screening framework,in order to reduce the computational burden and improve the computational efficiency of each detector,and reduce the risk of missing significant combinations,EnSSI designs a three-stage(score-switch-filter)iteration strategy to continuously output significant two-locus combianions to form candidate set.Base on the candidate set,EnSSI utilizes multi-type detectors to jointly determine epistasis.Simulation studies on two-locus and three-locus epistatic models indicate that EnSSI outperforms single-detecor based methods.Real study on GWAS dataset demonstrates the effectiveness and efficiency of EnSSI in detecting high-order epistasis on genome-wide dataset.
Keywords/Search Tags:Genome-wide association studies (GWAS), Single nucleotide polymorphism(SNP), SNP-SNP interaction, High-order epistasis, Screening strategy
PDF Full Text Request
Related items