Related studies have shown that long non-coding RNAs(lnc RNAs)are involved in cellular development and metabolic processes,and their subcellular location information is of great importance to treating some cancers and studying life sciences.Although the traditional biological experiment has the advantages of accurate positioning,it also has the problems of a long experiment cycle and high equipment cost.In recent years,although some researchers have tried to use biological computing methods to predict the subcellular location of lnc RNA and made some achievements,there are still two problems with the existing biological computing methods for subcellular localization of lnc RNA: first,traditional machine learning-based methods usually need to spend a lot of resources on feature extraction and fusion;second,deep learning-based methods are limited by the small number of samples and unbalanced distribution of the dataset,which cannot take advantage of their automatic extraction of advanced features.This paper attempts to predict the subcellular positions of lnc RNAs from the perspective of non-Euclidean space.The main work consists of two parts:In the first part,a prediction methods based on graph structure is proposed for the lnc RNA subcellular localization research,which includes data preparation,feature extraction,balanced sample distribution,construction of graph structure and prediction model.First,in constructing the graph structure,a more intuitive fixed-value undirected graph is proposed based on the similarity between nodes.Second,to solve the problem of a large number of independent nodes in the graph structure,quantitative undirected graphs are further proposed.Finally,to reduce the information redundancy and save the time and space cost of experiments,the quantitative directed graph is proposed.Based on the above three graph structures,three prediction models based on graph convolutional networks were initially proposed: vu-GCN,nu-GCN and nd-GCN,which were utilized to predict the subcellular location of lnc RNAs.From the experimental results,it can be seen that the nd-GCN model has the best performance(F1 score of 0.899,recall of 0.897,and accuracy of 0.877),which reflects the effectiveness of the prediction models based on graph convolutional networks and achieves the automatic extraction of high-level features of RNA sequences by deep learning methods.In the second part,the GM-lnc Loc algorithm combining graph neural network and meta-learning is proposed to solve the problem of the small sample size of the dataset to predict the subcellular location of lnc RNAs more accurately.Meta-learning can effectively handle the few-shot learning problem,but its training is based on a series of tasks,which is in conflict with the original whole graph structure in terms of data format.To solve this problem,each node and its neighboring nodes are extracted from the original graph structure to form a local graph.Then the meta-knowledge is further trained according to the training pattern of meta-learning to improve the training efficiency and prediction ability of the graph convolutional network.Finally,the GM-lnc Loc algorithm proposed in this paper has the best performance on 2benchmark datasets and 1 independent test set through experimental comparison with existing methods.Specifically,with the 10-fold cross-validation method,the accuracy of GM-lnc Loc on the 2 benchmark datasets is 94.6% and 94.7%,respectively,while the accuracy on the independent test set is 50.4%.To sum up,the prediction model based on graph meta-learning proposed in this paper has better performance and can facilitate the development of subcellular localization studies of lnc RNAs in bioinformatics. |