Font Size: a A A

Predicting The Whole-Genome Level Protein-RNA Interactions Based On Ensemble Learning

Posted on:2020-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhanFull Text:PDF
GTID:2370330590952085Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The interaction between RNA and protein plays a crucial role in regulating various cellular processes in organisms(such as regulation of gene expression),but research on the network of interactions between them has never been breakthrough in recent years.Traditional interaction prediction models are mostly constructed based on structural information of RNA and protein,but the generalization performance of such computational models comes from small datasets is not strong enough,and the source of structural datasets is as well as limited.On the other hand,most of such prediction models at this stage are only based on a single classifier.However,related theoretical studies of machine learning show that a classifier integrated from different basic classifiers has higher prediction accuracy and better stability than a single basic classifier.In view of the current difficulties in predicting the interactions between RNA and protein,two computational models only based on their sequence information instead of structural are proposed to solve these problems.A prediction model based on the deep learning stacking auto-encoder network combined with a random forest classifier is proposed in the first part of this paper.The model employs the position specific scoring matrix and the k-mer matrix to represent the protein and RNA sequences respectively,and then extracts the corresponding feature vectors based on the bi-gram and singular value decomposition algorithms.The prediction model also employs a deep learning stack auto-encoder network to extract and fuse the advanced hidden information from these vectors.These extracted information and labels are then fed into a random forest classifier to construct a basic predictive model.In addition,the model adopts the stacked integration strategy to integrate three different base prediction models to improve its prediction performance.Experiment results based on three public datasets show that the performance of this RNA and protein interaction prediction model can be improved by combining deep learning with feature extraction and using ensemble learning to integrate different base classifiers.A prediction model for RNA and protein interaction based on the boosting integrated learning LightGBM classifier is proposed in the second part of this paper.This second model also employs the position specific scoring matrix and the k-mer matrix to represent the protein and RNA sequences respectively,and then extracts the corresponding feature vectors using pseudo-Zernike moments and singular value decomposition algorithms.Furthermore,these feature vectors along with the labels are fed into a LightGBM classifier to obtain the final predictive model.Experiments based on four public datasets show that the LightGBM-based prediction model can keep good prediction performance while reducing the training time and memories.
Keywords/Search Tags:RNA and protein interaction, k-mer, stacked auto-encoder, ensemble learning, LightGBM
PDF Full Text Request
Related items