Font Size: a A A

Precise Identification Of RNA Editing Sites From DNA Sequence Data Based On Deep Learning Methods

Posted on:2022-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z B HaoFull Text:PDF
GTID:2480306758492064Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
RNA editing events are a key link in many biological activities.Gene regulation,RNA and protein regulation are all closely related to RNA editing.Therefore,analyzing and studying the RNA editing sites of related genes will help us better understand the formation and development of RNA editing.At present,there are some biological experimental methods to classify RNA editing sites,but they are very time-consuming,costly,and cannot accurately distinguish RNA editing sites from new SNP(single nucleotide polymorphism)sites.With the development of high-throughput sequencing technology,researchers have begun to classify RNA editing sites through machine learning methods.Some current studies use models such as logistic regression.The data processing process of the above methods requires specific prior knowledge,tedious filtering steps,and is time-consuming and labor-intensive;and the data sets used by the above methods are inconsistent,so their classification results also have a certain bias.In response to the above problems,this paper has done the following work:(1)A large number of RNA editing site data were re-collected,from the REDIportal database to 118,212 RNA editing sites for humans(hg19)and from the DARNED database for mouse(mm10)84,109 RNA editing sites.(2)One-Hot encoding is used to process the pre-transcriptional DNA sequence corresponding to the RNA editing site and convert it into binary vectors of two dimensions,thereby solving the problem of tedious filtering steps.(3)We propose three deep learning models to identify RNA editing sites: a two-dimensional 4-layer convolutional neural network model,a single-layer long-short-term memory network model,and a neural network model combining convolution with long-short-term memory.Convolutional neural networks can be effectively used for sequence analysis,mapping image data to output variables,learning the location and scale of variants in the data,characterizing them using convolutional filters,and then correcting linear units,so our two-dimensional 4 Layer convolutional neural networks can be used for data with spatial or ordered relationships;long short-term memory(LSTM)is a special recurrent neural network(RNN),LSTM uses the sequential features of the input data to construct Loop connections between blocks,connect memory blocks into layers,each block has the state and output of the block containing the components that manage the gate,maintain information about the sequence itself,which is stored in the hidden state of the LSTM,spanning multiple time steps,so our single-layer long short-term memory network can incorporate contextual information and locate distortions of past inputs,which can help solve sequence classification problems.From the above,we can see the advantages of the two neural network models.We combined the advantages of the convolutional neural network and the long short-term memory network to build a CNN-LSTM model,which also achieved good results.We process the human and mouse data into 1*4 and 1*8 vectors through One-Hot coding,and input them into the deep learning model we built respectively,and the accuracy rates are 98.70%,98.36%,etc.At the same time,We did k-fold cross-validation and got an average F1-score of 57.61%,57.42%,etc.Practice has proved that our model has good accuracy,stability and reliability.This model provides a new way of thinking for studying RNA editing sites and a new way for researchers to classify RNA editing sites.
Keywords/Search Tags:Convolutional network, Long short term memory network, RNA editing site
PDF Full Text Request
Related items