Font Size: a A A

Research On Recognition And Application Of Methylation Sites Based On RNA Sequences

Posted on:2020-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2370330590479050Subject:Engineering
Abstract/Summary:PDF Full Text Request
RNA post-transcriptional modification is the phenomenon of post-transcriptional modification at different positions of the RNA sequence.More than 150 post-transcriptional modifications have been discovered,the most common of which are 5-methylcytosine(m~5C)and 6-methyladenosine(m~6A).The above two common modification sites not only play an important role in the fate of yeast cells,but also have a certain regulatory effect on the embryonic development of humans and animals.Therefore,the accurate identification of m~5C sites and m~6A sites from primary RNA sequences is especially useful for deeply understanding the mechanisms and functions.Due to the difficulty and expensive costs of identifying methylation modification sites with wet-lab techniques,there is an urgent need to develop a machine learning-based method that can quickly and accurately predict functional sites.In this paper,in order to further improve the performance of the methylation site prediction model,the following work is mainly done:(1)A novel K-fold heuristic reduction algorithm based on the redundancy measure of the nucleotide physical-chemical properties has been designed in this paper;Then,the final predictor is constructed with classifier ensemble,which K base classifiers are obtained by combining SVM with reduction subsets using the reduction algorithm aiming to remove the redundant physical-chemical properties re-encoding the RNA sample sequence;Finally,rigorous jackknife tests on two benchmark datasets demonstrate that he predictor based on this algorithm outperforms state-of-the-art methylation sites predictors.On the m~6A dataset,the prediction accuracy of Mcc and AUC is 0.454 and 0.784,respectively.On the m~5C dataset,the prediction accuracy of Mcc and AUC is 0.859 and 0.962,respectively.(2)A feature coding method based on statistical methods and classifier ensemble has been designed in this paper.Then,the final predictor is constructed with classifier ensemble,which three base classifiers are obtained by combining SVM with three feature representation methods.Finally,rigorous jackknife tests on two benchmark datasets demonstrate that the predictor based on this algorithm outperforms state-of-the-art methylation sites predictors.On the m~6A dataset,the prediction accuracy of Mcc and AUC is 0.542 and 0.829,respectively.On the m~5C dataset,the prediction accuracy of Mcc and AUC is 0.95 and 0.992,respectively.(3)In order to facilitate the use of other researchers,this paper also designed and implemented the webserver of the methylation site online prediction.
Keywords/Search Tags:m~5C methylation, m~6A methylation, Heuristic reduction algorithm, Support vector machine, Classifier ensemble
PDF Full Text Request
Related items