Font Size: a A A

Protein-RNA Binding Prediction Based On Bi-LSTM And DenseNet

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:J P ZhuFull Text:PDF
GTID:2370330602481589Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The complexes formed by proteins binding to RNAs are essential in biological processes,and can also be useful for identifying causal disease variants,and gene expression regulation and translation.Protein-RNA interactions identified in vivo can be affected by experimental condition,noise,and some bias,while in vitro experiments yield clearer signals.Therefore,how to accurately infer RNA-protein binding models from in vitro data,which will accurately predict binding and unbound RNA transcripts in in vivo binding has become a key challenge.Aiming at this problem,this thesis constructs a protein-RNA binding prediction model(RDense)based on bidirectional long and short memory neural network(Bi-LSTM)and densely connected convolutional neural networks(DenseNet).The main work is as follows:(1)In the data extraction,considering the important role of the primary structure information of the sequence in the biological process,one-hot coding is used to solve the problem of data representation of the RNA sequence and expand the characteristics;in addition,the primary role of RNA secondary structure in RNA-binding proteins(RBP)is to establish a structural context(such as ring or unstructured)for RPB to recognize RNA sequence.This thesis uses variants of RNAplfold to extract relatively stable RNA secondary structures.Finally,this thesis digs into the secondary structure information in depth,and uses the Triplet method in repRNA to consider the two states of the secondary structure paired and unpaired to extract the RNA digital feature vector.(2)This thesis constructs a new deep neural network structure model(RDense).Based on the existing RNA sequence and secondary structure information,the digital feature vector extracted from the secondary structure of the RNA is introduced as input,and it combine with the Bi-LSTM and DenseNet to learn protein-RNA binding preferences.For optimization on the model,the loss function uses the absolute squared percent error to reduce the impact of discrete data,and the cross-validation is added to perform hyperparameter tuning of the model.The experimental results show that the prediction results of in vitro data are significantly better than current methods,and the performance of the model has been significantly.improved.Eventually,a model trained on in vitro data can be used to predict both the bound and unbound states of RNA transcripts in vivo.(3)The prediction results of different network structures on the in vitro dataset are compared,and the experimental results show that the model structure constructed in this thesis is optimal.In addition,by comparing the prediction results in vitro and in vivo in different methods,it is found that the introduced digital feature vector of RNA can improve the prediction of protein-RNA binding preference.Aiming at the specific role of protein-RNA binding during network training,this thesis generates an interpretable way from the model,draws sequence logos using the ggseqlogo software package and compares the sequences and structures visualized in different methods.Finally,this thesis summarizes the research work on protein-RNA binding prediction,and looks forward to the future work.
Keywords/Search Tags:Protein-RNA binding model, DenseNet, Bi-LSTM, RNA digital feature, Sequence logo
PDF Full Text Request
Related items