Font Size: a A A

A Gene Interaction Mining Methode Based On Information Gain

Posted on:2015-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:D L HuangFull Text:PDF
GTID:2180330422490929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the first initial victory of the genome-wide association studies, GWAShas been widely used to study the genetic mechanisms of human disease. However,the results are difficult to explain the genetic mechanisms of many complexdiseases. The main reason is that the complex diseases generally occur joint actionof two or more genes, a very slight effect of a single gene. To solve this problem,the study of gene-gene interaction is critical. To solve this problem, the study ofgene-gene interaction is critical. Currently, there are a lot of SNP-Based gene-geneinteraction mining methods, which tend to use SNP as the basic unit to research theinteraction between two SNPs, then to speculate whether there was an interactionbetween the genes the two SNP located. However, the gene is the basic unit of thefunctional expression, and a gene contains many SNPs. So if there was interactionbetween the two SNP, it couldn’t indicate that there was interaction between genes.To avoid this problem, we propose gene-based information gain model to mine thegene-gene interactions.The gene-based information gain model proposed in this paper was designedfrom the perspective of the overall SNPs of a gene, and the format of the data wascase-control. The basic theory of information entropy, information gain was used inthis model to detect the gene-gene interaction. In this model, the whole gene wasthe basic unit and all the SNPs had been taken into account. Compared with theSNP-Based model, the gene-based information model has avoided the problem thata single SNP was sufficient to represent the whole gene and the model can explainthe genetic mechanisms of disease from the perspective of biological.In order to assess the performance of the gene-based information gain model,we have designed not only the simulation experiments but also the real datavalidation, and have compared this model with the SNP-Based entropy model andgene-based KCCU model. In the simulation experiments, two evaluation index:power and the false positive rate have selected. Through simulation, the regulationof the power varies with the OR value, sample size and prevalence rate wasobserved, and the false positive rate when there was no interaction between two genes was analyzed too.In the real data validation, three genes: PADI6,SERPINA1and VDR that associated with the rheumatoid arthritis were selected.Whether in the simulation experiments or in the real data validation, the powerof gene-based information gain model was better than the SNP-Based entropymodel and the KCCU model both. The results have verified the gene-basedinformation gain model.
Keywords/Search Tags:complex disease, gene-gene interaction, the whole gene, informationgain
PDF Full Text Request
Related items