| High-throughput transcriptome sequencing technology is expected to discover new protein-coding and non-coding transcript sequences,especially long non-coding RNAs(lncRNA)identified from de novo sequencing technology.LncRNAs are RNA sequences with a length greater than 200nt and non-coding protein capability.LncRNAs play important biological functions in many stages,such as transcriptional and life process regulation.Accurately identifying lncRNAs from massive genome sequencing data lays the foundation for exploring their internal structure and predicting their functions in the future.The identification of animal and plant lncRNAs is difficult due to their small sample size,poor genome annotation,and complex structure.In this paper,lncRNAs were identified from animal and plant RNA based on deep learning algorithm.The main research content and innovative points of this article include the following three parts:(1)A proposed lncRNA recognition method based on attention mechanism is presented in this paper.This method utilizes a soft attention mechanism and multilayer perceptron that integrates k-mer usage frequency and ORF length,to construct a classification model for human lncRNA identification.The Lnc-Attention model achieves a high accuracy of 0.964 on the test set,which outperforms other traditional machine learning classification algorithms and deep learning-based classification algorithms,indicating higher identification accuracy.(2)Based on convolutional neural network,an effective method for identifying lncRNA,named Coding-Net classification method,is proposed.This method directly inputs the features of RNA sequences into the model for training,and uses 1 × 3 asymmetric convolutional kernels to extract classification information,which improves the prediction accuracy.It achieves an accuracy of 0.993 on the human dataset,which is better than traditional machine learning-based and existing deep learning-based lncRNA identification methods.In addition,the Coding-Net method has good universality in cross-species prediction,with accuracy greater than 0.9 in five species.Finally,this method is particularly suitable for primate animals,with prediction accuracies greater than 0.94.(3)Based on Coding-Net classification method,a novel method named "PlantCoding-Net"was proposed to predict plant lncRNA classification.This method selects the model plant Arabidopsis thaliana with rich gene annotations as the training data set,and identifies plant lncRNA transcript sequences based on the deep learning classification model.The model achieved an accuracy rate of 0.957 on the test set.Compared with the CPC2,PLEK and LncRNA_Mdeep plant lncRNA identification tools,the identification accuracy rates were increased by 4.1,20 and 3.6 percentage points,respectively.This paper proposed two deep learning-based methods for identifying animal lncRNA transcripts,Lnc-Attention and Coding-Net,which lay the foundation for further research into the functions of human lncRNAs.The proposed PlantCoding method can accurately identify plant lncRNA transcripts,enabling a better understanding of the structure and function of the plant genome and providing deeper insights for plant genome research.This paper proposed two deep learning-based methods have their own advantages.The Lnc-Attention classification method is suitable for the case of large amount of prediction data.Because the model network structure of Coding-Net method is more complex,the training parameters are more,and the prediction time and memory are more,it is suitable for the case of less prediction data. |