Background:In the traditional RNA research,it can be exclusively defined as messenger RNA(mRNA)or non-coding RNA(ncRNA)according to their coding ability.However,one unusual group of RNAs which can serve as both protein-coding and non-coding RNA has been discovered in animals,plants and bacteria in recent studies.These RNAs endowing with both protein-coding and non-coding functions is referred to as "dual functional lncRNA" or "cncRNA(coding and non-coding RNA)".Object:Although dual functional lncRNAs have attracted extensive attention in recent years and sufficient data have been collected through various experimental evidence,to our knowledge,there are no studies to identify dual functional lncRNAs.Here,it’s highly eager to construct an efficient and accurate computational framework to identify dual functional lncRNAs.Method:In this study,on the basis of cncRNAdb database,the bioinformatics prediction algorithm of bifunctional lncRNA was further developed based on the multi-head self-attention mechanism model.The main contents are as follows:The sequence information of dual functional lncRNAs was screened from the cncRNAdb database,and the sequences of human lncRNAs were downloaded from the Ensembl database.After sequence preprocessing,a benchmark dataset was constructed.In the model,we embed an attention module and a multi-layer perceptron(MLP),and in order to reduce model over-fitting and gradient disappearance,we have applied the dropout and ensemble strategies.Finally,on the benchmark dataset,we evaluate the performance of the model using a 5-fold cross-validation method.In order to test the stability and scalability of the model,we conducted evaluations on independent datasets and cross-species datasets.Independent datasets were derived from a combined analysis of transcriptome,translation and protein data from four leukemia cell lines,and cross-species data were derived from the cncRNAdb database.Since the context of the independent test set is a specific cell lines and the amount of cross-species data is too small,we used the area under the recovery curve in this evaluation.Finally,to explore the cellular mechanisms and functions,the sequence specificity of dual functional lncRNAs was analyzed on the MEME webservice platform.The specific sequence fragments are annotated using known human RNA binding proteins(RNA Binding Proteins,RBPs),and the possible regulatory mechanisms and functions affected are explored.Result:Our data demonstrated that LncReader showed multiple advantage compared to various classical machine learning methods using benchmark datasets.In the independent datasets,our model has still performed better than other models.Remarkably,LncReader achieved the best performance among the mouse and drosophila datasets.Finally,we performed sequence specificity analysis on the dual functional lncRNAs datasets.We found three significant position weight matrices(motifs)annotated with RNA binding protein(RBP).Totally,based on the current lack of dual functional lncRNA prediction methods,this study developed the LncReader algorithm to rapidly identify bifunctional lncRNAs,providing important technical support for RNA classification,function and evolution research. |