Font Size: a A A

Research On The Prediction Of Non-coding RNA-disease Associations Based On Graph Neural Networks

Posted on:2024-09-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q W WuFull Text:PDF
GTID:1524307352485034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Non-coding RNAs(nc RNA)play important roles in epigenetic regulation and are closely related to the occurrence and development of complex human diseases,with the advantages of being potential early molecular biomarkers and therapeutic targets.In recent years,data-driven computational methods represented by Graph Neural Networks(GNN)have achieved good results in solving the problem of non-coding RNA-disease association prediction,thanks to the accumulation of biological experimental data and the advancement of information technology.Although they have good ability to mine potential information in non-Euclidean data,they still have some deficiencies in aggregation sparsity,deep neighborhood information,model training,and generalization.Therefore,various graph neural network methods targeted for mining non-coding RNAs closely related to the occurrence and development of diseases are proposed in this dissertation,which can provide important reference basis and theoretical foundation for early diagnosis and drug development of diseases,and further enrich the theoretical framework of graph neural networks.The main contributions of this dissertation are as follows:(1)To solve the problem of sparse node attribute features in sparse nc RNA-disease association network,a nc RNA-disease association prediction method(GAEML)based on Graph Auto-Encoder(GAE)is proposed in this dissertation.GAEML establishes a biomolecular similarity network based on the Top-k filtering rule,and constructs a more complex association graph based on the similarity network and known association network.A simplified GAE is designed to learn effective node representations from the association graph,realizing the fusion of network structure features and node attribute features.Combined with the functional localization of the GAE,the reconstruction loss is redefined to reduce the negative impact of unknown edges in the association graph on the model.Experimental results show that the optimized GAE enhances the representation ability of node embeddings and improves the ability of the model to predict potential nc RNA-disease associations.(2)To solve the problem of how to extract the deep interaction relationship between nc RNA and disease,a novel nc RNA-disease association prediction method(MLGCN)based on multi-layer GNN is proposed in this dissertation.MLGCN uses residual connections to increase the depth of GNN and incorporates shallow node features into deep node features through jump connection.Based on attention mechanism,the output of each convolution layer of the GNN is weighted and summed,and the range of node neighborhood aggregation is controlled by learnable weights.The residual connection reflects the fusion of local neighborhood features,and the attention mechanism integrates neighborhood features from a global perspective.Experimental results demonstrate that the model can not only slow down the generation of over-smoothing in the deep GNN,but also learn higher-order neighborhood features,and can effectively improve the performance of nc RNA-disease association prediction.(3)To solve the problem of difficulty in model training due to the limited association data between nc RNA and disease,a nc RNA-disease prediction method(MFGEML)based on decoupling GNN is proposed in this dissertation.MFGEML transforms the association graph to a node similarity network using the distribution hypothesis theory in Natural Language Processing,obtains node initial features by implementing matrix factorization,and then uses the lightweight graph convolution operation to aggregate the features of neighbor nodes to enhance the initial features.This method decouples feature transformation and propagation in GNN,and implements an untrained node representation learning method using parameter-free algorithms.Experimental results show that MFGEML performs well on multiple benchmark datasets for nc RNA-disease associations.(4)To solve the problem of how to mining potential supervisory information from the nc RNA-disease association network itself,a graph self-supervised learning-based method,SSLGRDA,for predicting nc RNA-disease associations is proposed in this dissertation.SSLGRDA employs two self-supervised learning(SSL)strategies to construct contrastive and generative models,exploring self-supervised learning patterns suitable for nc RNA-disease association prediction from different perspectives.Two types of sub-models are designed for comparative learning of network topology features and node attribute features,as well as comparative learning of local network features and global network features.The generative model achieves node representation learning by reconstructing the original features of nodes.SSLGRDA uses two contrastive learning loss functions to adapt to different input graph data,and also introduces a supervised loss to promote the learning of node representations that are helpful for the prediction task.Experimental results on nine nc RNA-disease association datasets demonstrate that SSLGRDA has good generalization ability.
Keywords/Search Tags:Non-coding RNA-disease, Association Prediction, Graph Neural Network, Auto-Encoder, Self-Supervised Learning
PDF Full Text Request
Related items