| The rise of epigenetics has led to many branches of bioinformatics,including chemical modification of RNA,which is widely studied in this field.This field mainly studies the detection and sequencing of modification site and the effect of different modifications on biological genetics.Until now,more than 160 RNA chemical modifications have been found,which have significant effects on RNA pairing,splicing translation and transcription stability.Before the development of information technology,RNA chemical modification detection mainly depended on experimental methods such as high-throughput sequencing technology and mass spectrometry technology.Although these methods can detect modification site,the time and economic cost of the experiment are generally high,and the detection efficiency is low.Therefore,it is necessary to develop computer algorithms to detect RNA chemical modification.In addition,the sequencing of unbalanced data is a key and difficult point in the sequencing problem,because most of the traditional machine learning models are applicable to balanced data,and this research topic obtains important features from the sequence of two RNA chemical modifications(m6A and m6Am),and uses the class weight of the Deep Learning model to process unbalanced data,which shows good prediction performance.N6-methyladenosine(m6A)is a kind of post-transcriptional modification of RNA,which is one of the typical RNA methylation modifications and plays an important role in the editing and degradation of m RNA.For m6 A,this research topic proposes a prediction model of m6 A sites based on multi-branch CNN,m6A-CNLs.First this model uses three coding methods to encode RNA sequences,and then inputs the three coding information into three convolutional neural network models(CNN)to obtain three sets of new feature,then the three features are spliced and merged into the new feature space,and finally the classification prediction is made,In addition,we added an LSTM model to each CNN to capture contextual information.At last,based on this model,the m6 A unbalanced data set studied in this research topic obtained Sn=0.782;Sp=0.968;ACC=0.951;MCC=0.719 on the independent test set.The m6A-CNLs model also showed good performance in cross validation,which fully showed that the model was significantly reliable in the prediction of m6 A sites.N6,2 ’-O-dimethyladenosine(m6Am)is a relatively new reversible modification of RNA,which has an important impact on the life process of m RNA.But the exploration of the biological function of m6 Am is not enough at this stage.Therefore,this research topic organically combines Transformer and Bi-GRU,extracts features through sequential natural number coding,and proposes a new end-to-end “twin” Deep Learning network m6 Am Twins.Compared with the many algorithms,the performance of the model has been significantly improved on the two sets of imbalanced datasets.The Sn,Sp,ACC and MCC of full transcript set on the independent test set are 0.709,0.921,0.902 and 0.53 respectively,and the Sn,Sp,ACC and MCC of the mature RNA data are 0.645,0.945,0.918 and 0.545 respectively.What’s more,the cross-validation results of the training set further have shown that the model has good generalization ability.The m6 A and m6 Am sites studied in this research topic are both hot spots on RNA chemical modification sites at present.At the same time,from the perspective of biological characteristics and evolution rules,this research topic has developed a prediction model based on the unbalanced data of the two modification sites,which provides some help for the unbalanced classification in this field and the biological function research of RNA chemical modification. |