| Non-coding RNAs(ncRNAs)can be classified into two categories based on their length.Long non-coding RNAs(lncRNA)have a length greater than 200 nt,while those with a length less than 50 nt are referred to as short non-coding RNAs,it includes micro RNA(mi RNA),si RNA,and pi RNA.The complex and fine regulatory function of lncRNA provide researchers with a new perspective,which also elucidate the essence of genomic complexity,providing a new approach for researchers to better understand the complexity of life.More and more lncRNA have been proved to be closely related to human diseases,and compared to the rapidly increasing number of newly discovered lncRNA,only a few known lncRNA have been reported to be associated with diseases.Therefore,developing effective computational methods to predict potential lncRNA-disease associations is challenging and urgently needed.In recent years,some methods by calculating the association probability of lncRNA-disease pairs have been proposed to predict new lncRNA-disease associations,which can significantly reduce the time and cost of biological experiments.This article proposes corresponding solutions to the different problems in lncRNA and disease prediction,with the main work completed as follows:(1)Regarding the issues of insufficient information on lncRNA network,the need for manual extraction of meta-paths in heterogeneous graphs,insufficient utilization of information within meta-paths,identical attention calculation methods for heterogeneous and homogeneous subgraphs,and class imbalance between positive and negative examples,heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction is proposed,called MAGLDA.This method constructs an lncRNA-mi RNA-disease association network,and combine the lncRNA expression similarity and gaussian interaction profile kernel similarity embedding features to building lncRNA embedding features of input models.A similar approach is utilized to obtain the embedding features of the disease input model.The cosine similarity is calculated for the obtained embedding features of lncRNA or disease,and the KNN algorithm is applied to select the top k nodes to construct edges,thus obtaining the lncRNAlncRNA similarity network or disease-disease similarity network.The model generates and encodes the required meta-paths,applies different attention mechanisms to homogeneous and heterogeneous subgraphs,and finally uses the neural induction matrix completion matrix to reconstruct the lncRNA-disease association,and in the loss function Introduce cost-sensitive functions.Experiments show that MAGLDA is superior to other methods,and the effectiveness of the method is proved in case study and ablation study.(2)Aiming at the problems of lncRNA sequence features and insufficient use of structural information of heterogeneous graphs,a lncRNA-disease association prediction based on collaborative comparison of supervised graphs is proposed,called SHGCLDA.Firstly,a heterogeneous graph consisting of three types of nodes,lncRNA,mi RNA and disease is constructed.After automatic selection mechanism of meta-paths,the heterogeneous graph is transformed into a new graph structure defined by the meta-path.Attention mechanism is combined at the node feature level and meta-path level to learn low-dimensional representations of lncRNA and disease,at the same time,the lncRNA sequence information is processed to obtain its sequence features and combine with lncRNA functional similarity as lncRNA input features.Next,the topological map and the semantic map of the lncRNA-disease pair are respectively constructed,and the final prediction objective function is optimized through the collaborative comparative learning across the two networks.Finally,the squeeze function in the capsule network is applied for the final classification prediction.The results of 5-fold crossvalidation experiments show that SHGCLDA superior to other methods.Meanwhile,the case study showed that the model could better predict lncRNA associated with three common diseases of hepatocellular carcinoma,colorectal cancer and breast cancer.In addition,further ablation studies and analyzes confirm the effectiveness of each component in the model,which together constitute a model with good performance. |