| Evidence shows that long non-coding RNA(lnc RNA)and micro RNA(mi RNA)are two typical types of non-coding RNAs(nc RNAs),and their interaction plays a very important regulatory role in many biological processes.Exploring the unknown interaction between lnc RNA and mi RNA can not only help us better understand the functional expression of lnc RNA and mi RNA but also provide some new research directions and ideas for biological and medical research.At present,the interactions between lnc RNA and mi RNA mainly depends on biological experiments,but such experiments are often time-consuming and labor-intensive,so it is necessary to design a computational model that can be used to predict the interactions between lnc RNA and mi RNA.In recent years,more and more network algorithms and deep learning frameworks have been used to predict the interaction between lnc RNA and mi RNA.Although they can be used to predict the interaction between lnc RNA and mi RNA,they cannot accurately depict the interaction relationship between human lnc RNA and mi RNA.In this thesis,we propose a model called GCNCRF,which is based on graph convolution networks and conditional random fields and specifically used to predict potential human lnc RNA-mi RNA interactions.First,we collate and obtain the required data sets DAT1 and DAT2 from the Lnc RNASNP2 database,GENCODE database and mi Rbase database,and then construct the heterogeneous network by using the known interactions between lnc RNA and mi RNA,the comprehensive similarity network of lnc RNA/mi RNA,and the characteristic matrix of lnc RNA/mi RNA.Secondly,the graph convolution neural network is used to learn the input node information to obtain the initial embedding of nodes.The conditional random field layer set in the graph convolution neural network hidden layer can update the embedding,so that similar nodes have similar embedding.At the same time,an attention mechanism is added to the conditional random field layer to reallocate weights for nodes,so as to better grasp the characteristics of important nodes and ignore some nodes with less influence.Finally,the final embedding is decoded and scored through the decoding layer.Through a 5-fold cross-validation experiment,the AUC,ACC,RE and SP values obtained by GCNCRF on DAT1 are 0.9470,0.9814,0.8795 and 0.9254 respectively.In order to further verify the generalization ability of GCNCRF,we carry out experiments on data set DAT2 of different orders of magnitude.The AUC,ACC,RE and SP values obtained by GCNCRF on DAT2 are 0.887,0.9724,09245 and 0.7933 respectively.Whether on DAT1 or DAT2,GCNCRF has higher prediction accuracy than the other six mainstream methods.We hope that GCNCRF can become an effective biomedical research tool to predict potential human lnc RNA mi RNA interactions. |