Font Size: a A A

Research And Application Of Deep Learning In CircRNA Recognition

Posted on:2022-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z C YinFull Text:PDF
GTID:2480306512951749Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
circRNA(circular RNAs)are one kind of non coding RNAs.Its special ring structure makes it more stable.Because of the specific expression of circRNA in tumor cells,it can be used as a clinical tumor marker,and the identification of RNA sequences is the key key to identify the marker.However,the traditional identification methods are basically based on the biological expression characteristics of circRNA or the alignments with the existing RNA sequence library.There are some false positive rates and inaccurate judgment of RNA fragments.The deep learning neural network has developed rapidly in recent ten years and applied in practice.In this thesis,the whole RNA sequences were input into the deep learning model as features,and finally,the feature vector were output to predict the RNA sequence type.This end-to-end algorithm model could help to reduces the complexity of the program,and the deep learning model can well mine the sequence features to achieve the purpose of accurate recognition and classification.In this thesis,we prepared human circRNA and m RNA were prepared as positive and negative data sets respectively with a total of about 40000 RNA sequence data.About 40,000 data were divided into training set and test set according to the ratio of 9:1.One hot coding or word embedding coding was used.five models were included:LSTM(Long Short Term Memory),Text CNN(Text Convolutional Neural Networks),Text RCNN(Text Recurrent Neural Network),transformer and ELMo(embedding of language model)were used to construct the circRNA prediction model.LSTM is a classical recurrent neural network,which can save the information of sequence in different positions to a certain extent.In this experiment,the single-layer one-way LSTM model is performed as the main structure of the first model,and the accuracy rate is 97% on the test dataset.Text CNN uses CNN convolution neural network model,works well in the local position of the sequence.Four convolution kernels with different sizes are used as the highway network model,and the accuracy rate is 98%.Text RCNN has advantages of the former two.Firstly,it was encodes by RNN(Recurent neural network)and then enters CNN(Convolutional Neural Network).Single input is changed into three inputs by sequence shift,which simplifies the calculating cyclic network,and the accuracy in the test dataset is 98%.Transformer introduces the attention mechanism and a large number of neural network skills of large network,this experiment in the original part of the transformer slightly changed to obtain a more lightweight transformer,the final accuracy in the test dataset is 89%.One of the novel characteristicsof thiesis is the introduction of pre-training model ELMo into RNA prediction model.The parameters of the pre-training model are trained by importing the known circRNAs segments and coding sequence into ELMo model.In the downstream task,the word vector embedding of the sequence in the context environment is obtained through the pre training model.After simple classification by linear combination,the accuracy rate of95% is achieved in this experimental test dataset.In each model section,the characteristics and shortcomings of the prediction model in RNA sequence recognition are discussed in details,as well as the possible improvements The accuracy of each scenario is compared with that of each model.In the fourth chapter,we use Django framework was prepared to deploy the five models on the line and construct API interface.Finally,the technology of all models were summarlized,and there are a bright prospect of artificial deep learnning in RNA sequence recognition.
Keywords/Search Tags:cricRNA, deep learning, LSTM, TextCNN, Transformer, ELMo
PDF Full Text Request
Related items