Predicting The Whole-Genome Level Protein-RNA Interactions Based On Ensemble Learning

Posted on:2020-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:C H Zhan

Full Text:PDF

GTID:2370330590952085

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The interaction between RNA and protein plays a crucial role in regulating various cellular processes in organisms(such as regulation of gene expression),but research on the network of interactions between them has never been breakthrough in recent years.Traditional interaction prediction models are mostly constructed based on structural information of RNA and protein,but the generalization performance of such computational models comes from small datasets is not strong enough,and the source of structural datasets is as well as limited.On the other hand,most of such prediction models at this stage are only based on a single classifier.However,related theoretical studies of machine learning show that a classifier integrated from different basic classifiers has higher prediction accuracy and better stability than a single basic classifier.In view of the current difficulties in predicting the interactions between RNA and protein,two computational models only based on their sequence information instead of structural are proposed to solve these problems.A prediction model based on the deep learning stacking auto-encoder network combined with a random forest classifier is proposed in the first part of this paper.The model employs the position specific scoring matrix and the k-mer matrix to represent the protein and RNA sequences respectively,and then extracts the corresponding feature vectors based on the bi-gram and singular value decomposition algorithms.The prediction model also employs a deep learning stack auto-encoder network to extract and fuse the advanced hidden information from these vectors.These extracted information and labels are then fed into a random forest classifier to construct a basic predictive model.In addition,the model adopts the stacked integration strategy to integrate three different base prediction models to improve its prediction performance.Experiment results based on three public datasets show that the performance of this RNA and protein interaction prediction model can be improved by combining deep learning with feature extraction and using ensemble learning to integrate different base classifiers.A prediction model for RNA and protein interaction based on the boosting integrated learning LightGBM classifier is proposed in the second part of this paper.This second model also employs the position specific scoring matrix and the k-mer matrix to represent the protein and RNA sequences respectively,and then extracts the corresponding feature vectors using pseudo-Zernike moments and singular value decomposition algorithms.Furthermore,these feature vectors along with the labels are fed into a LightGBM classifier to obtain the final predictive model.Experiments based on four public datasets show that the LightGBM-based prediction model can keep good prediction performance while reducing the training time and memories.

Keywords/Search Tags:

RNA and protein interaction, k-mer, stacked auto-encoder, ensemble learning, LightGBM

PDF Full Text Request

Related items

1	Research On Plant Leaf Image Classification Based On Stacked Auto-Encoder Network
2	Prediction Of Protein-protein Interactions Based On Wavelet Transform And Ensemble Learning
3	Research On Network Representation Learning Algorithm Based On Subgraph Convolution Auto-Encoder
4	Classification Of Non-classical Secreted Proteins Of Gram-positive Bacteria Based On Two-layer LightGBM-based Ensemble Model
5	A Deep Neural Network Model Integrating Protein Interactions For Prioritizing Cancer-related Proteins And Drug Target Combination
6	Searching Method Study On M Subdwarfs Based On Ensemble Learning
7	Reconstruction Of Protein Structure Based On Auto Encoder
8	Research On The Protein Modification Sites Based On Machine Learning
9	Application Of Deep Learning In Hyperspectral Image Classification
10	Remote Sensing Inversion Of Soil Polymetallic Elements Content Based On Stacked Auto-encoder Extreme Learning Machine Model