Complex diseases, the etiology of which is complicated, is related to genetic factors and may also related to environmental factors. Although the widely application of linkage analysis, genome-wide association analysis (GWAS) and even the whole-genome sequencing method promoted the identification of complex disease causing genes, the causative gene identification and molecular mechanisms of complex diseases are still big problems in the field of genetics. As known causative genes of the same disease phenotype tends to be closely functional linked, we are likely to get candidate disease causing gene by finding genes functional associated with known disease genes in the genome. The accumulation of a variety of omics data makes it possible to predict complex disease candidate genes through omics data integration by bioinformatics model. In this paper, we use the random forest model and the method of gene association network-based pathway analysis (GANPA) to integrate12types of data describing gene characteristics and3types of data describing gene-gene interaction, and thus we constructed an effective disease candidate gene prediction model named as RF-GANPA. We use this prediction model in three examples. The first example is for primary immunodeficiency disease (PED) candidate gene prediction. We predicted800candidate genes, which were then prioritized according to gene expression data analysis. Compared with the other four methods, RF-GANPA can predict the most newly identified PID disease from2009to recent, which showed the excellent performance of our prediction model. The second example is the prediction of schizophrenia candidate genes. Here,18candidate genes predicted by RF-GANPA was validated by significant SNP site in schizophrenia GWAS data. The third one is predicting candidate genes of22categories of human genetic disease. Candidate genes of these diseases were validated by mouse phenotype information. It’s showed candidate of each disease group enriched mouse phenotypes which were also enriched in known genes.In summary, RF-GANPA is a good candidate gene prediction tool which is both reliable and broad applicable. Therefore, we can use this method to provide high-quality list of candidate genes for complex disease research. |