Font Size: a A A

Prediction Of LncRNA-disease Associations Based On Representation Learning At Different Levels

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:N ShengFull Text:PDF
GTID:2518306320466674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The long non-coding RNA(lncRNA)are widely involved in various human diseases,identifying disease-related lncRNA is important for understanding the pathogenesis of complex diseases.However,designing biological experiments to verify lncRNA-diseases association is usually expensive and time-consuming.Therefore,the development of effective calculation methods to predict the association between lncRNA and disease,can provide reliable disease-related lncRNA candidate for biologists,which can effectively reduce the cost and improve the efficiency in biological experiments.However,how to accurately predict the association between lncRNA and disease is a task worthy of research.In this paper,three computational model-based methods are proposed to predict lncRNA-disease associations from different levels of lncRNA and disease,including lncRNA and disease pairwise level,lncRNA and disease nodes level,and lncRNA and disease heterogeneous graph level.Aiming at different levels of research and problems,a prediction method based on the combination of machine learning and deep learning is proposed.The experimental results show that the prediction performance of the three models is better than the previous methods.The main tasks completed are as follows:(1)Aiming at the research on the lncRNA and disease pairwise level,this paper proposes a prediction model,VADLP,to extract,encode and adaptively integrate multilevel representations.Firstly,a triple-layer heterogeneous graph is constructed with weighted inter-layer and intra-layer edges to integrate the similarities and correlations among lncRNA,diseases and mi RNAs.We then define three representations including node attributes,pairwise topology and feature distribution.Node attributes are derived from the graph by an embedding strategy to represent the lncRNA-disease associations,which are inferred via their common lncRNA,disease and mi RNA.Pairwise topology is formulated by random walk algorithm and encoded by a convolutional autoencoder to represent the hidden topological structural relations between a pair of lncRNA and disease.The new feature distribution is modelled by a variance autoencoder to reveal the underlying lncRNA-disease relationship.Finally,an attentional representation-level integration module is constructed to adaptively fuse the three representations for lncRNAdisease association prediction.The proposed model is tested over a public dataset,through multiple evaluation metrics and case study analysis,our model outperforms five state-of-the-art lncRNA-disease prediction models.(2)Aiming at the research on the lncRNA and disease nodes level,this paper proposes a prediction model based on multi-scale attention and adversarial autoencoder,SAADLP.First,we adopted the same strategy as VADLP,using the similarities and correlations among lncRNA,mi RNA and disease to construct a triple-layer heterogeneous graph to better extract and represent the feature vector of lncRNA and disease nodes.Then,an autoencoder based on multi-scale attention is used to adaptively assign different weights to different scales of nodes,thereby effectively learning and fusing the first order and multi-order neighbor relationship of lncRNA and disease nodes.Matching the potential posterior distribution with the given prior distribution through adversarial regularization train,makes the low-dimensional representation of the node learned by the autoencoder more robust and informative.Finally,The Light Gradient Boosting Machine(Light GBM)is used as a classifier to predict the association score between lncRNA and disease.Experimental results show that SAADLP has superior performance than five state-of-the-art lncRNA-disease association prediction methods.In addition,case studies of three diseases further demonstrate the capability of SAADLP in discovering the potential lncRNA-disease associations.(3)Aiming at the research on the lncRNA and disease heterogeneous graphs level,this paper proposes a prediction model based on a multi-channel graph autoencoder,MCGDLP.First,we use the similarities and correlations between lncRNA,mi RNA and diseases to construct a triple-layer heterogeneous graph.Then,the graph convolutional network is used to extract the topological structure information and specific attribute information of lncRNA and disease nodes from the lncRNA-mi RNA-disease heterogeneous graph and the corresponding subgraph respectively.In addition,we also adopted a cross-embedding integration mechanism to fuse structural representation and attribute representation,and adopted combine training strategy to optimize the entire model to ensure the consistency of the learned embedding representation.Finally,The Light GBM is used to predict potential lncRNA-disease associations.The experimental results show that MCGDLP outperforms to other methods not only in AUC but also in AUPR,and can predict lncRNA-disease associations more accurately.In addition,case studies on three diseases further confirm that MCGDLP is able to discover potential candidate disease-related lncRNA.
Keywords/Search Tags:Multi-level representation learning, LncRNA-disease association prediction, Convolutional autoencoder, Variance autoencoder, Adversarial autoencoder, Graph convolutional network
PDF Full Text Request
Related items