Font Size: a A A

Research On Prediction Of RNA And Protein Binding Sites Based On Sequence And Structural Information

Posted on:2022-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LiFull Text:PDF
GTID:2480306332453374Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The interaction between protein and RNA is indispensable in many life activities.This interaction is closely related to certain life activities in the living body,such as gene translation and expression and disease regulation,etc.With the rapid development of sequencing technology,the interactions between RNA and protein that have been discovered continue to increase,which makes it possible to use machine learning methods to predict RNA-protein interactions on a large scale.In the past ten years,deep learning models have been widely used in prediction tasks based on biological sequences,including the prediction of the interaction between RBP and RNA.The performance of the prediction model is generally limited by two elements,namely the feature representation of the input data and the classification model.Methods based on machine learning generally require experimenters to manually design data features based on domain knowledge,while methods based on deep learning have unique advantages in feature representation and learning capabilities,which can not only improve prediction accuracy,but also help identify pairs of sequences in the sequence.Motifs where binding affinity is critical.In the choice of classification model,sequence information processing of bioinformatics and text data processing of natural language processing have some similarities,so we can learn from the advanced research results in the field of natural language processing.Therefore,this research starts from the perspective of data sources and network models.First,we collect,sort and screen the RNA sequence data in the existing CLIP-seq database,use RNAfold and bp RNA to obtain the corresponding secondary structure annotation information,and use the sequence and secondary structure annotation information as the data source.In terms of network model,the convolutional neural network(CNN)is used to extract the sequence and structural features of RNA separately as combined features,and then the Position Embedding is used to learn the long dependence of the sequence.Finally,we propose a hybrid neural network model RBPformer,which uses RNA sequence information and secondary structure information to predict the binding site on RNA.In addition,this study also conducted experiments on whether the secondary structure information can help the model's prediction performance.Finally,through the prediction and evaluation of the model on the CLIP-seq data set composed of 31 experiments of 19 proteins and comparison with similar prediction algorithms,the experimental results show that the RNA secondary structure information is indeed helpful for the prediction performance of the model.Compared with other similar models,the hybrid neural network has obvious advantages.The AUC of this model can reach 84% on the test set,indicating that the hybrid neural network has good performance.The proposed hybrid neural network model and the addition of secondary structure features provide new ideas for in-depth study of the interaction between RNA and protein.
Keywords/Search Tags:Convolutional Network, Sequence Encoding, Structure Encoding, Transformer
PDF Full Text Request
Related items