Font Size: a A A

Prediction Of Modification Sites Of Human M7G Based On Sequence Information

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:C MaFull Text:PDF
GTID:2370330611955225Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
N-7 methylguanine(m7G)modification is one of the most common modifications in post-transcriptional regulation.The modification is widely distributed in the 5' region of tRNA,rRNA and eukaryotic mRNA.It plays an important role in maintaining RNA's processing metabolism,stability,nucleation and protein translation.The identification of m7 G could provide important clues for understanding its function.Most of identification method mainly depend on biochemical experiments to identify sites.However,the disadvantages of traditional biological experiments to identify the modification sites are increasing.With the rapid development of sequencing technology,the accumulation of RNA data containing m7 G modification sites provides us an opportunity to systematically study m7 G modification sites identification.Therefore,it is very important to construct a prediction model for m7 G modification site indentification.At present,there are few models for predicting m7 G modification sites.This situation promotes us to develop a bioinformatics-based prediction model for m7 G modification sites prediction.This thesis is to construct a prediction model based on the sequence information of m7 G modification site.We firstly extracted four kinds of features including property and frequency,pseudo nucleotide composition,k-mer,single nucleotide binary code from RNA sequence containing m7 G modification site.Based on four features,support vector machine was used to construct m7 G modification site prediction model.Subsequently,the accuracy of the model was improved through parameters optimization,feature fusion and feature selection.Furthermore,we established a bioinformatics tool based on optimal model.The ten-fold cross-validation's result shows that the accuracy of the proposed model of m7 G modification sites is 94.67% with the area under ROC curve of 0.98.In addition,in order to compare the evaluation models,we also conducted different algorithms to build prediction models.It was found that the SVM-based model has great advantages over other machine learning algorithms.Finally,we shared the prediction model and project details on GitHub.
Keywords/Search Tags:N-7 methylguanine, pseudo nucleotide composition, machine learning, feature selection, prediction model
PDF Full Text Request
Related items