Font Size: a A A

Research On Swarm Intelligence Optimization Algorithm For Epistasis Detection Of Genome-wide Association Research

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:J L WuFull Text:PDF
GTID:2480306329499014Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The implementation of the HGP and the progress of biotechnology,bioinformatics was born and developed rapidly.GWAS(Genome-Wide Association Studies)is one of the important research issues in bioinformatics.GWAS research provides a path to the study of polygenic diseases.It can discover a large number of genetic markers of SNPs that have never been known to humans.Scientists have provided more research clues for discovering polygenic diseases.Among them,epistatic testing plays a vital role in helping to study the causes of complex diseases in genetics.In recent years,researchers have developed many methods that can be used for epistatic detection,but many methods use exhaustive search strategies and cannot be applied to data sets at the genome-wide level with huge amounts of data;although some algorithms can be applied on large-scale data sets,after the interaction of SNPs,the huge amount of calculation and multiple tests cause a series of problems such as huge amount of calculation,high complexity,and many false positive results.Therefore,in our research,it is of great significance to innovate a more effective epistasis detection algorithm that can be applied to the whole genome level and has a low false positive rate.In response to the above problems,this paper conducts research on swarm intelligence optimization algorithms for genome-wide epistasis detection.When designing a swarm intelligence optimization algorithm,it is difficult to calculate the evaluation function that measures the connection between SNPs,and it is prone to problems such as the low degree of correlation between the interaction of SNPs and the disease.For the ant colony algorithm in swarm intelligence optimization,the design of heuristic information is also a difficult point,and it is prone to problems such as insufficient sufficient information.How to skillfully combine the existing foundation to design heuristic information suitable for superior detection is also a challenge.After solving the above problems,the algorithm designed to screen out a small number of SNPs combination still has the problem of more noisy data and more false positive results,and further detection and screening are needed.In response to the above-mentioned problems,we designed a superiority detection algorithm based on swarm intelligence optimization ACO-GAB and a two-stage superiority detection algorithm based on swarm intelligence optimization ACO-FHG.ACO-GAB mainly has the following innovations:When designing the fitness function,ACO-GAB combines the gini impurity,the AIC score in the logistic regression,and the K2 score in the Bayesian evaluation criterion.They have all been used to measure epistasis detection,and each has advantages and disadvantages.Using the advantages of these evaluation functions to complement each other and avoid the deficiencies between each other can effectively measure the degree of association between the combination of SNPs and the disease.The comparison of experimental results verifies the rationality of the fitness function in ACO-GAB,but ACO-GAB still has certain problems,so we designed the ACO-FHG algorithm to make further improvements.ACO-FHG mainly has the following innovations:1)For the ant colony algorithm,it is often difficult to obtain heuristic information because there is no prior knowledge.ACO-FHG combines SMUC and multi-SURF* as heuristic information into the decision rules of the ant colony algorithm to effectively guide the ants to the upper position.Search for sex.2)In order to avoid the problems of too much noisy data and high false positive results,ACO-FHG adopts a two-stage method.After ant colony algorithm screens out a small part of SNPs as candidate solutions,G-test is used to screen them for a second time,which can effectively reduce the false positive results.By comparing the results of ACO-FHG and other algorithms on multiple simulated data sets,the advantages and usability of ACO-FHG in detecting upper interactions are verified;calculations on real data sets have found the causes of complex diseases For disease genes,the ACO-FHG algorithm can help epistasis detection at the genome-wide level.Subsequent research will address the problem of increasing time complexity as the number of SNPs increase,and explore more efficient methods or deploy them on a big data platform to solve the problems of long calculation time and low efficiency.
Keywords/Search Tags:Epistasis Detection, Swarm Intelligence Optimization Algorithm, Ant Colony Algorithm, G-test, Genome-wide Association Analysis
PDF Full Text Request
Related items