Font Size: a A A

Sequence-based RNA Methylation Modification Site Prediction Studies

Posted on:2018-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:G Q LiFull Text:PDF
GTID:2350330512976800Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
The post-transcriptional modifications(PTCMs)of RNA are very common in living organisms,and play important roles in various biological processes,whereas RNA methylation is an important branch of PTCMs.RNA methylation refers to the phenomenon that methylation modifications occur on some nucleotide molecules of RNA,including N6-methyladenosine,N1-methyladenosine,etc.Recent researches have shown that RNA methylation modifications can affect RNA transcription,metabolism,splicing,and stability;can bind with related proteins so as to regulate gene expression.Moreover,RNA methylation is also associated with many diseases,such as tumor and obesity.Therefore,it is an important task to accurately identify RNA methylation modification sites from RNA sequences.Traditional methods of identifying RNA methylation sites are based on physicochemical experiments,which are expensive,time-consuming and small-scale.The high-throughput sequencing techniques developed in recent years can perform efficient and large-scale RNA methylation sites identification,however,they are still biomedical experiment-based methods.Thus,it is highly-required to design a machine learning theory-based method to perform RNA methylation sites prediction.The sequence-based RNA methylation sites prediction problem is deeply researched in this paper.The main works are as follows:(1)The basic characteristics of RNA are researched,and a new feature extraction method of RNA sequences is proposed.Motivated by the successful applications of position-specific propensity theory in protein modification sites prediction,this paper applies this theory with RNA sequences,and proposes position-specific nucleotide/dinucleotide propensities feature,to extract RNA sequence feature.The proposed method exploits statistical method,and calculates the occurrence frequency of every nucleotides at every positions in RNA sequences from the positive and the negative samples datasets,respectively;and use the differences between positive and negative samples for feature encoding.Experimental results show that the feature extraction method proposed in this paper could further improve the accuracy of N6-methyladenosine sites prediction.(2)The sequence-based N6-methyladenosine sites prediction problem is studied,and a prediction method named TargetM6A is proposed.TargetM6A uses the proposed position-specific nucleotide/dinucleotide propensities feature,along with traditional nucleotide composition feature,to extract feature from RNA sequences;the extracted feature is further filtered by incremental feature selection method to obtain a more discriminative feature subset;and finally uses support vector machine algorithm to train a prediction model.Experimental results show that the method proposed in this paper acquires superior performance on benchmark datasets,when compared with existing prediction methods.(3)The sequence-based Nl-methyladenosine sites prediction problem is studied,and a prediction method named TargetM1A is proposed.The recently released experimental data of Nl-methyladenosine is processed and sampled,constituting 3 species-based and 6 tissue-based N1-methyladenosine site datasets.The TargetM1A method extracts several RNA sequence-based features and uses extremely randomized trees algorithm as classifier.This method exhibits promising performances in cross-validation tests on both species-based and tissue-based datasets,and could be a beneficial complementary tool for existing N1-methyladenosine site research methods,which are based on wet-lab experiments.(4)The proposed TargetM6A and TargetM1A methods have provided online prediction services,which are free available for other researchers.
Keywords/Search Tags:RNA methylation, N6-methyladenosine, Nl-methyladenosine, position-specific propensity, support vector machine, extremely randomized trees, online prediction service
PDF Full Text Request
Related items