Font Size: a A A

EDLMFC:An Ensemble Deep Learning Framework With Multi-scale Features Combination For NcRNA-protein Interaction Prediction

Posted on:2022-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2480306764994849Subject:Chemistry
Abstract/Summary:PDF Full Text Request
Non-coding RNAs(ncRNAs)play an indispensable regulatory role in many life activities,such as translation,splicing,regulation of post-transcriptional genes,gene modification,gene degradation,chromatin remodeling,human diseases,etc.In eukaryotes,many ncRNAs perform multiple functions by interacting with proteins.Therefore,the prediction of ncRNAs-protein interactions(nc RPIs)is of great significance for the study of ncRNAs function and disease diagnosis.Currently,the experimental methods to determine nc RPIs are still time-consuming and labor-intensive.Therefore,computational methods are urgently needed to predict nc RPIs quickly and accurately.An ensemble deep learning model with multi-scale features combination was proposed to predict nc PRIs,named EDLMFC.Multi-scale features include not only primary sequence features,but also secondary structure sequences and tertiary structure features.Sequence features are encoded by the conjoint k-mer coding method.Then,the tertiary structure features were superimposed and fed into convolutional neural network(CNN)and bi-directional long short-term memory network(BLSTM)integrated deep learning model.CNN extracted the high-level abstract features of ncRNAs/protein and further fed them into BLSTM to capture the long-range dependencies.Two similar CNN integrated BLSTM neural networks were constructed to learn ncRNAs and proteins,respectively,converting the learned features into feature column vectors through one fully connected layer,then,the two feature column vectors were linked together,and determining whether ncRNAs-proteins interact with each other through a three-layer connection and Softmax activation function.In order to evaluate the superiority of EDLMFC,we selected RPITER,IPMiner and CFRP for comparison,and performed five-fold cross-validation(5CV)on three datasets of RPI1807,NPInter v2.0 and RPI488.Since the selection of samples in the training procedure is random,the 5CV was repeated 10 times in each data set,and the average of the 10 results was took as the final result.The accuracy(ACC),true positive rate(TPR),true negative rate(TNR),positive predictive value(PPV),F1-score(F1),Matthews correlation coefficient(MCC),and area under the curve(AUC)of the receiver operation characteristic(ROC)of EDLMFC on the three datasets are 0.861,0.745,0.967,0.961,0.829,0.742 and 0.899;0.938,0.969,0.845,0.949,0.959,0.833 and 0.967;0.897,0.917,0.877,0.882,0.899,0.795 and 0.959,respectively.Overall,EDLMFC is 0.1-7.7% better than RPITER,IPMiner,and CFRP.Through different feature combinations,finding that the primary sequence features are the most important,and the secondary and tertiary structure features also contain useful information.When all the features were used as inputs,they complemented each other,making the model more accurate in predicting nc RPIs The NPInter v2.0 dataset was divided into six categories according to species source,namely Homo sapiens,Mus musculus,Saccharomyces cerevisiae,Caenorhabditis elegans,Drosophila melanogaster,and Escherichia coli for independent verification,the ACC of the five categories reached85.3%,94.8%,91.2%,93.9%,89.1% and 93.1%,respectively,and the overall ACC reached 89.7%.In addition,ncRNAs/proteins in Mus musculus ncRNAs-protein interaction networks constructed based on independent validation results identified the hub ncRNAs/proteins in the process of Mus musculus nc RPIs,which will help to analyze the biological functions of ncRNAs and proteins,understand the mechanism of key life activities,and facilitate various medical and pharmaceutical studies.The source code of EDLMFC and the datasets used in this work are available at https://github.com/Jingjing Wang-87/EDLMFC.
Keywords/Search Tags:ncRNAs-protein interaction, multi-scale features combination, conjoint k-mer, ensemble deep learning, ncRNA-protein networks
PDF Full Text Request
Related items