Non-codingRNA(ncRNA)is theRNA that can not be translated into protein.MiRNA,lncRNA and circRNA are three main kinds of ‘modern’ ncRNAs performing multiple biological functions.As important regulatory factors,they participate in various cell life activities,and are closely related to the occurrence and development of many human diseases,especially tumors.As ncRNAs perform major regulatory roles,disease researches focus on detecting the pathogenesis and biomarkers of disease based on ncRNAs recently.Biological experiment technology has been used to identify the association between ncRNAs and diseases.However,biological experiment may take lots of time and experience,and it is not suitable for the large-scale association identification.To promote the research process of biological experiments,many computational methods have been proposed for predicting associated diseases with specific ncRNA.The existing methods suffer from some common limits,such as noise interference,and can not capture effective high-level features of ncRNA-disease pairs.Heterogeneous relation network preserve rich structure and semantic information,and effective representation of heterogeneous relation network can improve the learning ability of the model in the downstream application tasks.This dissertation mainly solves different ncRNA-disease association identification tasks.Many computational methods based on network representation learning are proposed,aiming to fully learn the association pattern of ncRNA-disease from heterogeneous relation network,and further improve the accuracy and comprehensiveness of association prediction.Specifically,the following identification methods are proposed in this dissertation:(1)For identifying lncRNA-disease associations,to address the problem that pair features contain noise information and ignore network neighborhood structure,we propose a method called i LncRNAdis-FB based on learning heterogeneous relation structure.The method integrates multi-source biological prior knowledge of lncRNA and disease,and constructs heterogeneous relation network.The 3D feature block representation strategy of heterogeneous relation network is firstly proposed to capture the neighborhood relationship between the nodes of lncRNAs and diseases.The noise in 3D feature blocks are further reduced by convolution neural network,and then extract the high-level feature representation of lncRNA-disease pairs.The experimental results show that i LncRNAdisFB can learn strongly discriminative characteristics of lncRNA-disease pairs in heterogeneous network,and improve the identification performance in multiple application scenarios.(2)For identifying circRNA-disease associations,to address the problem of noise interference in relation network and insufficient learning for heterogeneous relations,we propose a method called iCircDA-GNMF based on relation network reformulation and graph regularization.To reduce the false negative noise in the initial relation network,iCircDA-GNMF uses the neighbor interaction profiles of circRNAs and diseases in heterogeneous relation network to update the association adjacency matrix from the horizontal and vertical directions respectively.Furthermore,two graph regularization terms are added to the objective function to ensure that similar circRNAs and similar diseases are closer in the latent subspace.The experimental results indicate that iCircDA-GNMF can reduce noise,and provide more information beneficial to the model learning.The subspace features are complied with biological entity interactions,so as to improve the identification performance of circRNA-disease associations.(3)For identifying circRNA-disease associations,to address the problem that the ability of predicting associated diseases for new circRNA is limited,two methods are proposed.These two methods regard identification of circRNA-disease association as search tasks.The first method is called iCircDA-LTR,which is constructed based on Learning to Rank.iCircDA-LTR describes pair features from various perspectives,and the ranking information of associated disease for query circRNAs are modeled.In the process of model optimization,more attention are paid to learn the experimentally verified circRNA-disease associations.In addition,to enhance the accuracy and comprehensiveness of correlation measure of circRNA-disease pair,we propose the second method called iCircDA-FRM by fusing multiple relation measure models.Heterogeneous relation network is constructed based on considering interactions between different biological entities,and different discriminant predictors based on heterogeneous relation network are integrated via Learning to Rank framework in a supervised manner.The experiment results demonstrate the effectiveness of predicting associated diseases for new circRNAs of iCircDA-LTR and iCircDA-FRM.(4)For identifying the associations between multi-class ncRNAs and cancers,to address the problem that it is difficult to learn multi-class association patterns from multisource and heterogeneous data,multiple heterogeneous relation information and key network representation technologies in single-class association identification are integrated from different perspectives,aiming to identify the associations between miRNAs,lncRNAs,circRNAs and cancers simultaneously.Firstly,we propose a method called iNcRCA-HGAT based on hierarchical graph attention network.Ten kinds of relations between biological entities are considered,and three heterogeneous relation networks are constructed by seven meta-paths.Because the difference of quality and node expression ability between three heterogeneous relation networks,iNcRCA-HGAT designs double-layer graph attention network to aggregate the multi-layer semantic expression information of nodes.In addition,because the features learned by different network representation algorithms are complementary,an ensemble method for identifying multi-class associations called iNcRCA-ENR is proposed to improve prediction performance.Based on iNcRCA-HGAT,iNcRCA-ENR further integrates other three network representation algorithms to fuse attribute and structure information,and adopts convolution neural network to filter noise and extract high-level multi-class pair features.Compared with base methods,iNcRCA-HGAT and iNcRCA-ENR can effectively capture the latent association patterns between multi-class ncRNAs and cancers,and achieve better predictive performance.In summary,this dissertation focuses on studying and discussing the problem of ncRNA-disease association identification,and various methods based on heterogeneous relation network representation are proposed.The experimental results show that the proposed methods can effectively detect potential ncRNA-disease associations,so as to provide candidate molecular markers for biological experiments of disease,and promote the early diagnosis and target drug development for diseases. |