Font Size: a A A

The Research On Epistasis Detection Algorithm In Genome-wide Association Study

Posted on:2021-01-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y SunFull Text:PDF
GTID:1360330623477244Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the progress of science and technology and the advent of the era of big data,a large number of data have emerged in various research fields.It brings unprecedented opportunities and challenges to scholars in the field of computer science and technology.How to mine the information we need from the huge data is one of the hot spots in the field of information science.The data of genome-wide association study(GWAS)is a kind of high-dimensional biological data.It often contains hundreds of thousands of SNPs,as well as thousands of normal samples and disease samples.Researchers hope to reveal the relationship between SNPs and diseases by analyzing the data,so as to promote the research of diseases.Complex diseases are common diseases caused by multiple genes and factors.In recent years,genome-wide association study has become one of the main means to study complex diseases.It is of great significance to detect the combination of epistatic SNP in genome-wide association study to explore the explanation,prevention and treatment of complex diseases.Detection of epistasis in genome-wide association study is a very complex high-dimensional combinatorial optimization problem.In recent years,researchers have proposed many methods to detect epistasis in genome-wide association study.We classify these methods into five categories: exhaustive method,random method,filtering method,model method and evolution method.These epistasis detection algorithms mainly have the following four problems,which affect the accuracy and efficiency of epistasis detection algorithm:first,many algorithms are often designed based on one function to measure the relationship between SNP combination and disease.When the potential pathogenic model does not meet some pre-set assumptions,the function used to measure will fail,thus limiting the accuracy and detection ability of the algorithm.Second,many detection algorithms are based on some classical swarm intelligence optimization algorithms,which often lack the pertinence to the epistasis detection problem and do not make full use of the characteristics of the genome-wide association study data;third,the detection algorithm based on swarm intelligence optimization algorithm has a very strong bias to SNPs with strong marginal effects,which is not conducive to the SNPs without margin effects;Fourth,with the development of bioinformatics,a large number of biological databases have emerged and improved.These biological databases have become an indispensable tool in the current bioinformatics research.However,most of the current epistasis detection algorithms have not tried to use these biological databases to improve the ability of the algorithm to detect epistasis.In view of the above problems,this paper systematically studies the epistasis detection algorithm in genome-wide association study,and proposes four epistasis detection algorithms:1.HS-MMGKG is an evolution method proposed in this paper.It is designed based on harmony search.In order to make up for the problems caused by a single function to measure the relationship between SNP combination and disease,HS-MMGKG maintains five harmony memories at the same time.Each harmony memory corresponds to a function to measure the relationship between SNP combination and disease.These five functions are MDR,mi,gini,k2 and g.These functions come from different fields and measure the relationship between SNP combination and disease from different perspectives,making up for each other.The experimental results show that the ability of HS-MMGKG to detect epistasis is better than other classical algorithms;2.SEE is the further improvement of HS-MMGKG.With our study of epistasis detection and genome-wide association study data,we realize that many classical swarm intelligence optimization algorithm frameworks still have many shortcomings when applied to detect epistasis.Therefore,we refine and adjust the traditional swarm intelligence optimization algorithm,and propose a new swarm intelligence optimization algorithm.We propose that the relationship between SNP combination and disease should be divided into two types: association and association source.SEE algorithm uses four functions to measure the association.At the same time,four functions are designed to measure association source.The eight functions to measure the relationship between SNP combination and disease are fused by the sort strategy.Experimental results show that SEE has a greater improvement in accuracy and time than other algorithms;3.SHEIB is a kind of random method.Compared with the detection algorithm based on swarm intelligence optimization algorithm,SHEIB is more focused on detecting the epistasis without marginal effect.In SHEIB,a strategy based on k2 function is proposed to detect the [2,mo] order epistasis in a combination of mo SNPs.Based on the strategy,SHEIB can detect any order epistasis.We propose two hypotheses about epistasis.Based on the two hypotheses,SHEIB can use ”gene mapping data”and ”gene association data” to further improve its ability to detect epistasis.The experimental results show that the detection ability of SHEIB is greatly improved compared with other algorithms,and its detection ability can be further improved by using ”gene mapping data” and ”gene association data”;4.SHEIB-AGM is proposed to solve the problem that SHEIB does not achieve the desired effect by using ”gene association data”.By introducing ”automatic gene matrix” into SHEIB,the ”gene association data” constructed from biological database is replaced.The experimental results show that under the premise of providing necessary biological data,SHEIB-AGM is better than SHEIB in detecting epistasis;For the four proposed algorithms,we have conducted a lot of experiments on the simulated data set and the real data set.The experimental results show that the algorithms studied and designed in this paper have obtained excellent results.The results of these algorithms will be helpful to promote genome-wide association study and the research of complex and common diseases.At the same time,the ideas of these algorithms will also be conducive to the development of algorithms in other fields of computer science.The research of computer algorithm theory is a strong support for the research of epistasis detection in bioinformatics.At the same time,the research results of bioinformatics will play an important role in promoting the development of computer science.
Keywords/Search Tags:genome-wide association study, epistasis, swarm intelligence optimization algorithm, complex disease, harmony search, single nucleotide polymorphism
PDF Full Text Request
Related items