| Long non-coding RNA(lncRNA)refers to non-coding RNA that is greater than 200 nucleotides in length.In recent years,a large number of studies have shown that lncRNA plays an important role in many basic biological processes,and has an important impact on the occurrence and development of diseases,especially cancer.As one of the pathogenic factors,from the perspective of long non-coding RNA,the molecular mechanism of human complex diseases is still not fully understood.Predicting the association between potential lncRNAs and diseases can help people better understand the pathogenesis of diseases,and can also provide help for the development of new drugs and the formulation of personalized diagnosis and treatment for a variety of complex human diseases.In recent years,computational prediction models have gradually become a research hotspot in the field of bioinformatics.Its advantage is that it does not rely on expensive and time-consuming biological experiments and can quickly and effectively predict potential lncRNA-disease associations.In this article,we build a biomolecular association network model to predict the potential association between lncRNA and disease.Using lncRNA and disease attribute information and behavior information,two prediction methods are proposed: the prediction method DWLDA based on the shallow neural network Deep Walk model and the prediction method GRLDA based on the matrix factorization Gra Rep model.Specifically,first use lncRNA,mi RNA,drugs,proteins,and diseases as the network nodes of the molecular association network,using lncRNA-disease,mi RNA-lncRNA,mi RNA-disease,mi RNA-protein,lncRNA-protein,protein-disease,drug-protein,drug-disease and protein-protein have nine known associations as the edges of the network to construct a molecular association network;Then,combining lncRNA and disease attribute information(lncRNA sequence information and disease semantic similarity)and lncRNA and disease behavior information(obtained through network embedding methods Gra Rep and Deep Walk,respectively),to digitally characterize the known association between lncRNA and disease;Next,select lncRNA-disease association data obtained from different databases as positive samples,and randomly select unknown associations with the same number of positive samples as negative samples to form a training data set;Finally,a random forest classifier with excellent comprehensive performance is selected for experimental training,verification and testing.In order to evaluate the comprehensive performance of the two methods we proposed,a five-fold cross-validation experiment was performed on the two models,and the results were excellent.Then,by comparing different features and different classifiers,the two prediction models also achieved better results.In addition,to further verify the effectiveness of the prediction algorithm,we conducted case studies on three types of cancer,and finally confirmed the top ten lncRNA candidate genes selected in the prediction model in some representative databases.The overall experimental results show that the two methods we proposed have shown excellent performance in predicting the association between lncRNA and disease. |