Font Size: a A A

Prediction Of Protein Binding Sites On RNA Sequences Based On Machine Learning

Posted on:2020-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:K M ZhangFull Text:PDF
GTID:2370330623963637Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The complexes formed by binding of RNAs to proteins play key roles in many biological processes.While the detection of RNA-protein interaction by biological experiments is a laborious and time-consuming task,computational prediction tools are in great demand.The prediction performance of the computational tools relies on two factors,namely feature representation of RNA sequences and classification models.In the existing methods,statistical features or one-hot vectors are adopted,and most of the classifiers are traditional machine learning models,while the distributed representation and flexible deep learning architectures have not been exploited.Therefore,in this study,we represent RNA sequences by continuous distributed features,and propose a hybrid deep learning architecture,which combines both CNN and RNN,where a convolutional neural network(CNN)learns high-level abstract features and a recurrent neural network(RNN)learns long dependency in the sequences.At the same time,for the RNA world’s rising star circular RNA(circRNA),we constructed a CRIP(CircRNA Interact with Proteins)model,which uses only RNA sequences to predict protein binding sites on circRNA.In order to fully exploit the sequence information,we propose a stacked codonbased encoding scheme.The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences,and the hybrid network outperforms conventional classifiers by a large margin,where both the CNN and RNN components contribute to the performance improvement.To the best of our knowledge,CRIP is the first machine learning-based tool specialized in the prediction of circRNA-protein interactions,which is expected to play important role for large-scale function analysis of circRNAs.
Keywords/Search Tags:circRNA, CRIP, codon-based encoding, CNN, LSTM
PDF Full Text Request
Related items