Font Size: a A A

Research On Genome-Wide Association Analysis Algorithms Based On MDR

Posted on:2020-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z J TangFull Text:PDF
GTID:2370330590486904Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Genome-wide association studies(GWAS)is an effective method for screening disease-related SNPs by analyzing case-control data or random population data using genotype information at millions of SNPs loci in the human genome as genetic markers.Human complex diseases are mainly affected by gene-gene interaction(GGI).Currently,genome-wide association analysis mostly uses the method of correlation statistics between disease and single SNP locus.However,the study of single gene effect will make the genetic and complex traits of most complex diseases unable to be explained.Therefore,more efficient algorithms are needed to detect disease-related multi-gene interactions.Multi-factor dimensionality reduction(MDR)is a non-parametric,model-free method for revealing gene-gene interactions and gene-environment interactions associated with common complex diseases,which is suitable for case-control research.The K nearest neighbor algorithm is a high efficiency and simple principle classification algorithm.Multi-objective optimization is a mechanism to solve the problem of multiple interacting or conflicting target components.On this basis,aiming at the problems of false positive errors and inefficiency of multi-factor dimensionality reduction algorithm in some cases,this paper proposes a new multi-factor reduction algorithm based on multi-objectiveoptimization mechanism and K-NN classification(MK-MDR)for detecting gene-gene interaction in genome-wide association analysis.The MK-MDR algorithm is divided into four parts: The first part includes parameters initialization and data set partition.In the second part,the K-nearest neighbor algorithm is used to classify the samples in high-low risk,and the two-way contingency table is generated based on the high-low risk and case-control attributes of the samples.In the third part,the table is transformed into two values for assessing the association between SNP locus combination and disease—Balanced Correct Classification Rate(BCCR)and Likelihood Ratio(LR).The multi-objective function is constructed by using BCCR and LR to perform multi-objective optimization on SNP locus combination.In the fourth part,cross-validation is carried out,and finally the SNP locus combination with the highest cross-validation consistency and the lowest error rate is selected as the final model.In this paper,the performance of the MK-MDR algorithm is tested on simulated and real data sets,and compared with the popular genome-wide association analysis algorithms such as TEAM and BOOST.The experimental results verify that the MK-MDR algorithm is superior to other algorithms in terms of efficiency and so on.On real AMD datasets,the MK-MDR algorithm is feasible in detecting gene-gene interactions associated with disease.
Keywords/Search Tags:Genome-wide association analysis, Multi-factor dimensionality reduction, Gene-gene interaction, K-nearest neighbor, Multi-objective optimization
PDF Full Text Request
Related items