Many research has revealed that long non-coding RNAs(lnc RNAs)and micro RNAs(mi RNAs)play crucial regulatory roles in disease development.Consequently,researchers regard them as reliable clinical markers or promising targets for therapeutic intervention.Exploring potential associations among lnc RNAs,mi RNAs and diseases are an essential part of prevention,diagnosis,and treatment of diseases.Traditional laboratory experiments are costly and time consuming.As a result,there is a practical requirement to develop computational methods to accelerate the experimental screening process for potential lnc RNA-disease associations(LDAs),mi RNA-disease associations(MDAs),and lncRNA-miRNA interactions(LMIs).Biological networks model biomolecule interactions using graph,retaining a wealth of relational information.Recently,advancements in computer technology have enabled the widespread application of deep learning in bioinformatics.Specifically,graph neural networks(GNNs),a deep learning approach for unstructured data,can model the complex biological network relationships by incorporating network structure and node features for representation learning.Constructing GNN algorithms for lnc RNAmi RNA-disease heterogeneous graph(LMDHG)by integrating lnc RNA,mi RNA,and disease-related data has become a hot research topic in bioinformatics to uncover potential interaction relationships.Therefore,we propose several improved prediction methods based on the GNNs for inferring relationships among lnc RNAs,mi RNAs and diseases,which provide reliable prediction tools to screen disease-related lnc RNA and mi RNA candidate markers to further assist biological experimental research.The main research contents are as follows:1.We propose a self-supervised embedding method called GCLMTP based on graph contrastive learning,which overcomes the limitation of existing computational methods that focus on single prediction task.Unlike a single prediction task,multi-task prediction can incorporate the regulatory interplay among lnc RNAs,mi RNAs and diseases,effectively enhancing the model’s generalization capability.The core idea aims to predict LDAs,MDAs and LMIs by simultaneously extracting embedding representations of lnc RNAs,mi RNAs and diseases.To achieve this,the GCLMTP construct a triple-layer LMDHG that integrates the complex relationships between these entities based on their similarities and correlations.Next,an unsupervised embedding model based on graph contrastive learning is employed to extract potential topological feature of lnc RNAs,mi RNAs and diseases from the LMDHG.The graph contrastive learning leverages graph convolutional network(GCN)architectures to maximize the mutual information between patch representations and corresponding high-level summaries of the LMDHG.Finally,multiple classifiers are explored to predict LDA,MDA and LMI scores.Comprehensive experiments are conducted on two datasets derived from older and newer versions of the database,showing that GCLMTP outperformed baseline methods in predicting disease-related lnc RNAs and mi RNAs.Additionally,case studies on two datasets further demonstrate the ability of GCLMTP to accurately discover new associations.2.We propose a contrastive self-supervised GCN framework,CSGLMD,that combines supervised and self-supervised learning to predict potential relationships among lnc RNAs,mi RNAs and diseases.The approach performs multi-task prediction from task transfer perspective,integrating supervised learning advantages to overcome the limitation that self-supervision learning extract static and generic features in the previous work GCLMTP.It also combines the benefits of self-supervisied learning to alleviate the problem that the scarcity of labeled LDAs,MDAs and LMIs can limit the prediction ability of traditional GCNs.Similar to GCLMTP,CSGLMD primarily leverages the rich association and similarity relationships among lnc RNAs,mi RNAs and diseases to construct an LMDHG that contains three types of biological entities.It can effectively embed multi-source biological data and assist the model extension to other prediction tasks.In addition,a label instantiation mechanism is applied to adapt the LMDHG to GNN structures and control the strength of similarity relationships between the same biological entities.Secondly,CSGLMD implements GCNs as encoder to extract node embedding features from the LMDHG,and utilizes a multirelational modelling decoder to predict LDAs,MDAs or LMIs.Finally,we designed a contrastive self-supervised learning task that guides the learning of node embeddings without relying on labels,acting as a regularize in the multi-task learning paradigm to enhance model generalization.Extensive results on two datasets(from the older and newer versions of the database,respectively)show that CSGLMD significantly outperforms state-of-the-art methods in predicting disease-associated lnc RNAs and mi RNAs.Case studies on old and new datasets can further demonstrate the capability of CSGLMD to discover disease-related new candidate lnc RNAs and mi RNAs.3.We propose a multi-task prediction model called SSCLMD,which incorporates domain knowledge to identify potential LDAs,MDAs and LMIs.Building on previous studies,the model combines the advantages of supervised and self-supervised learning to fully extract node features in different spaces by performing self-supervised contrastive learning on attribute and topology graphs.Additionally,two datasets are manually constructed to address the problem of missing domain knowledge for lnc RNAs,mi RNAs and diseases.Firstly,their domain knowledge and interactions are exploited to construct attribute graph and topology graph,respectively.Then,node embeddings are learned in attribute and topology spaces to extract specific and common features.Meanwhile,the attention mechanism is performed to adaptively fuse the embedding from different views.SSCLMD incorporates a contrastive self-supervised learning task as a regularize to guide the learning of node embeddings in both attribute and topology space without relying on labels.Severing as a regularize in multi-task learning paradigm,it to improves the model’s generalization capabilities.Extensive experiments on two manually curated datasets demonstrate that SSCLMD significantly outperforms other baseline methods in LDA,MDA and LMI prediction tasks.Additionally,case studies on both old and new datasets further supported the ability of SSCLMD to uncover novel disease-related lnc RNAs and mi RNAs.In summary,to address the limitations of existing computational methods for predicting relationships among lnc RNAs,mi RNAs and diseases,this paper aims to establishes a unified multi-task prediction framework.It proposes three GNN prediction model based on self-supervised,combining supervised and self-supervised learning as well as incorporating domain knowledge,respectively.Compared to singletask predictions,these models leverage multi-source data information more comprehensively and complement each other,thereby enhancing the accuracy and robustness of the prediction.The three presented works build on each other progressively,together advancing research on identifying LDAs,MDAs and LMIs.This has strong application value by providing important screening tools for diseaserelated lnc RNA and mi RNA biomarkers.Notably,the source data and code supporting this paper are publicly available on Git Hub. |