Font Size: a A A

Research On RNA Editing Site Identification Algorithm Based On Deep Learning

Posted on:2019-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:R X GuoFull Text:PDF
GTID:2370330611493334Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
RNA editing is a rare post-transcriptional sequence modification process that enables mature RNA to differ from the template DNA sequence by insertion,deletion or substitution of bases,and is an important complement to the genetic central dogma.RNA editing can cause polymorphisms in gene expression products by affecting subcellular localization of mRNA and regulation of heterochromatin gene silencing,and can also alter non-coding RNA sequences and their interaction with target genes.However,the existing RNA editing site identification method relies heavily on the prior knowledge and public genome annotations,and there are problems of complicated engineering,low precision,and poor generalization ability.At the same time,the massive multi-omics data generated by high-throughout sequencing technology poses a huge challenge to the identification of RNA editing sites.In view of the existing problems and challenges in RNA editing site identification,this study realized two kinds of deep learning-based RNA editing site identification algorithms which can accurately identify RNA editing sites while avoiding the complicated manual filtering steps and having good generalization ability across cell lines.It mainly includes the following three aspects of work:1.Method for constructing RNA editing site gold standard set based on ENCODE program.Since RNA editing site identification does not currently have a common gold standard set,in order to use deep learning to automatically extract and learn the basic features of RNA editing sites from the sample set,we proposed a method for constructing RNA editing site gold standard set based on ENCODE program,using the RNA-Seq data of 32 cell lines in the ENCODE program to construct a gold standard set as a sample set for subsequent deep learning algorithms.2.Bidirectional LSTM-based RNA editing site identification algorithm rnnRed.In view of the existing problems and challenges in RNA editing site identification,we designed and implemented a bidirectional LSTM-based RNA editing site identification algorithm rnnRed,which automatically extracts and learns the basic features of RNA editing sites from the sample set through the front and back directions.The algorithm avoids the cumbersome manual filtering steps based on prior knowledge and common genome annotation;it can accurately identify RNA editing sites from a collection of variant sites containing various complex types;the average AUC areas of 95.97% and 95.82% were obtained in 11 training set cell lines and 21 test set cell lines,respectively,with good generalization ability across cell lines.3.Res Net-based RNA editing site identification algorithm cnnRed.Aiming at the current problems and challenges of RNA editing site identification,we designed and implemented a Res Net-based RNA editing site identification algorithm cnnRed,which automatically extracts and learns the basic features of RNA editing sites from the sample set through a convolutional neural network based on residual network.The algorithm avoids the cumbersome manual filtering steps based on prior knowledge and common genome annotation;it can accurately identify RNA editing sites from a collection of variant sites containing various complex types;the average AUC areas of 96.74% and 96.65% were obtained in 11 training set cell lines and 21 test set cell lines,respectively,with good generalization ability across cell lines.
Keywords/Search Tags:RNA Editing, SNP, ENCODE program, LSTM, ResNet, AUC
PDF Full Text Request
Related items