Font Size: a A A

Research On SNP Interaction Detection Method Based On Multitasking Ant Colony Optimization

Posted on:2024-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2544307061981779Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Genome-wide association study(GWAS)is one of the most important approaches to investigating the causes of human complex diseases.In recent years,significant progress has been made through the association analysis between single nucleotide polymorphisms(SNPs)and diseases using GWAS.However,the pathogenic factors of complex diseases are complex and diverse,and the analysis of the association of single loci is limited in explaining the pathogenic mechanism of complex diseases.The non-linear interactions between multiple SNPs is considered one of the most important pathogenic factors in human complex diseases.But the highdimensional and small-sample characteristics of genomic data pose significant computational challenges for detecting the interactions between multiple SNPs at the genome-wide level.On the one hand,as the number of SNPs involved in the interaction increases,the number of combinations to be evaluated increases exponentially.Existing methods can only detect k-order SNP interactions after a single run,and multiple SNP combination detection algorithms of corresponding orders need to be executed to detect SNP interactions of different dimensions.On the other hand,the unknown genetic structure of complex diseases makes it very difficult to accurately evaluate the association between multiple SNP loci and diseases,and it is difficult to identify a variety of SNP pathogenic models using a single evaluation criteria.To address these challenges,this article mainly focuses on the following three aspects:(1)In order to improve the ability to identify and detect diverse SNP pathogenic loci,a multi-criteria ant colony optimization(MCACO)algorithm is proposed.The MCACO algorithm is divided into three stages.In the first stage,two ant colonies search in parallel using the K2-score and Jensen-Shannon divergence as evaluation criteria,respectively,aiming to detect potential SNP combinations associated with disease status.In the second stage,redundant SNP loci are removed from the SNP combinations using a feature importance ranking method based on random forest.In the third stage,the statistical significance of the detection results is verified using the G-test.Finally,experiments on 20 simulated data models demonstrate that MCACO can almost reach 100% detection accuracy on 12 models with marginal effects,and can identify more pathogenic loci in datasets with fewer samples.(2)In order to perform rapid searches of SNP pathogenic combinations of different orders,a multi-tasking ant colony optimization algorithm for detecting multi-order SNP interactions(MTACO-DMSI)is proposed.MTACO-DMSI can detect 2-order,3-order,...,k-order SNP interactions in parallel,and is divided into search and validation stages.In the search stage,multiple high-order SNP interaction detection tasks are executed in parallel,with two populations set for each task,using the K2-score and Jensen-Shannon divergence as evaluation criteria,respectively,to improve the global search ability of the algorithm and the discriminative ability on diverse disease models.In the validation stage,the G-test statistical method is used to verify the authenticity of candidate solutions.Compared with traditional single-task algorithms,MTACO-DMSI has stronger detection capabilities on 20 interaction effect models,requires less computational resources to complete k-order detection tasks,and the pathogenic loci and higher-order SNP combinations reported in the relevant literature were detected in three real datasets,and their classification accuracy was over 95%.(3)In order to further improve the efficiency of knowledge transfer in the MTACODMSI algorithm and enhance its ability to identify models of diseases with no marginal effects,a multi-task ant colony optimization algorithm based on unified coding(MTACOUC-DMSI)is proposed for detecting multi-order SNP interactions.MTACO-UC-DMSI sets a unified coding for all tasks and completes knowledge transfer between tasks through sequential crossover operations.Furthermore,in response to the problem that MTACODMSI cannot identify some disease models with no marginal effects,an evaluation criterion ND_JE-score is introduced to detect no marginal effects disease combinations.Compared with MTACO-DMSI,the improved MTACO-UC-DMSI algorithm maintains its detection ability on 12 models with marginal effects,and more than 80% of the detection ability for 6models without marginal effects.In this study,we investigate the application of multitasking ant colony optimization algorithms to tackle large-scale combinatorial optimization problems.Through extensive experimentation with both simulated and real data,we demonstrate the efficacy of the three algorithms proposed in this paper in mining high-order SNP interactions in complex disease datasets.Our experimental results not only validate the performance of the detection methods but also showcase their potential to guide further research on complex diseases,and facilitate the discovery of explanatory results associated with complex diseases.
Keywords/Search Tags:Complex diseases, Multi-tasking optimization, Ant colony optimization, Single nucleotide polymorphisms, High-order SNP interaction
PDF Full Text Request
Related items