Font Size: a A A

Gene-disease Association Mining Based On Modular Biomolecular Networks And Graph Neural Network

Posted on:2024-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LiFull Text:PDF
GTID:2530307148463154Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Mining the association between genes and diseases is of great significance for understanding the mechanism of disease occurrence,improving the diagnosis of genetic diseases,and enhancing the level of treatment.However,clearly identifying the association between genes and diseases through biological experiments requires a huge cost.With the rapid development of high-throughput sequencing technology,large-scale omics data accumulate quickly,bringing new opportunities and challenges for studying the association between genes and diseases.Therefore,this thesis carries out the following research work:(1)In order to consider the collaborative relationships between biomolecules,a genedisease association mining method based on modular biological networks is proposed.Specifically,a modular biological network is constructed using protein-protein interaction networks,gene regulatory networks,protein complexes,and metabolic pathways,where protein complexes and metabolic pathways play a synergistic role in biomolecular interactions.To better characterize the module information,a method for extracting initial feature information from modular biological networks based on the Node2vec algorithm is designed.Furthermore,a heterogeneous relational graph neural network model is proposed to mine gene-disease associations.Finally,experimental results on the OMIM dataset show that the new model performs significantly better than traditional graph embedding learning models in multiple evaluation metrics.(2)To better mine useful information from whole-genome sequencing data by considering the correlation between single nucleotide polymorphism sites,we propose a gene association mining method based on the Random Forest algorithm.Specifically,a Random Forest classifier is constructed using whole-genome sequencing data,and the correlation between single nucleotide polymorphism sites is obtained by using the positional relationship between the decision nodes and leaf nodes of decision trees in the Random Forest classifier.On this basis,gene associations are mined using the mapping relationship between single nucleotide polymorphism sites and genes.Furthermore,the mined gene associations are added to the weighted biological molecular network constructed by modular biological network,and a heterogeneous weighted graph neural network model is designed to predict pathogenic genes.Finally,the effectiveness of the weighted biological molecular network is validated using Alzheimer’s disease dataset from DisGeNET database.Experimental results show that the weighted biological molecular network achieves better prediction performance on Alzheimer’s disease,with an accuracy and AUC of 0.936 and 0.730,respectively.
Keywords/Search Tags:Gene and disease association mining, Modular biological network, Graph neural network, Random forest, Whole genome sequencing data
PDF Full Text Request
Related items