Font Size: a A A

Prediction Of M6A(m) Methylation Sites Based On Ensemble Deep Learning

Posted on:2024-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z T LuoFull Text:PDF
GTID:2530306911993759Subject:Statistics
Abstract/Summary:PDF Full Text Request
N6-methyladenosine(m6A)is the most abundant within eukaryotic messenger RNA modification,which plays an essential regulatory role in the control of cellular functions and gene expression.N6,2’-O-dimethyladenosine(m6Am)is a post-transcriptional modification that may be associated with regulatory roles in the control of cellular functions.However,it remains an outstanding challenge to detect m RNA m6A(m)transcriptome-wide at base resolution via experimental approaches,which are generally time-consuming and expensive.Developing computational methods is a good strategy for accurate in silico detection of m6A(m)modification sites from the large amount of RNA sequence data.Therefore,it is crucial to accurately identify transcriptome-wide m6A(m)sites to understand under-lying m6A(m)-dependent m RNA regulation mechanisms and biological functions.1)We review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals.Our model im6APred builds upon a comprehensive evaluation of multiple classification methods,including four traditional classification algorithms and three deep learning methods and their ensembles.The optimal base–classifier combinations are then chosen by five fold cross-validation test to achieve an effective stacked model.Our model im6APred can produce the area under the receiver operating characteristic curve(AUROC)in the range of 0.82–0.91 on independent tests,indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification.Moreover,AUROCs in the range of 0.77–0.96 were achieved using cross-species/tissues validation on the benchmark dataset,demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.2)We proposed an ensemble deep learning framework,named DLm6Am,to identify m~6Am sites.DLm6Am consists of three similar base classifiers,each of which contains a multi-head attention module,an embedding module with two parallel deep learning sub-modules,a convolutional neural network(CNN)and a Bi-directional long short-term memory(Bi LSTM),and a prediction module.To demonstrate the superior performance of our model’s architecture,we compared multiple model frameworks with our method by analyzing the training data and independent testing data.Additionally,we compared our model with the existing state-of-the-art computational methods,m6Am Pred and Multi RM.The accuracy(ACC)for the DLm6Am model was improved by 6.45%and 8.42%compared to that of m6Am Pred and Multi RM on independent testing data,respectively,while the area under receiver operating characteristic curve(AUROC)for the DLm6Am model was increased by 4.28%and5.75%,respectively.All the results indicate that DLm6Am achieved the best prediction performance in terms of ACC,Matthews correlation coefficient(MCC),AUROC,and the area under precision and recall curves(AUPR).To further assess the generalization performance of our proposed model,we implemented chromosome-level leave-out cross-validation,and found that the obtained AUROC values were greater than 0.83,indicating that our proposed method is robust and can accurately predict m~6Am sites.
Keywords/Search Tags:N6-methyladenosine, N6,2’-O-dimethyladenosine, m~6Am site identification, deep learning
PDF Full Text Request
Related items