Font Size: a A A

Research On Prediction Of Protein-Protein Interactions In Plants Based On Ensemble Learning

Posted on:2022-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:J PanFull Text:PDF
GTID:2480306725468934Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Prediction of protein-protein interactions(PPIs)in plants represents an important aspect of system biology.The understanding of plant PPIs networks can provide major insights into the regulation of the development,pathological and physiological process.Recently,with the improvement of high-throughput technologies,a large amount of protein sequence data has been generated and provided a valuable data resource for predicting PPIs.However,traditional experimental methods cost a lot and time-consuming.Thus,there is an urgent need to develop an accurate and effective computational model to predict potential protein-protein interactions in plants.In order to overcome the drawbacks of the traditional experimental methods,several works were done as follows.(1)We collected three plants PPIs datasets from the public databases,including the model plant Arabidopsis,and two important foods Zea mays and Oryza sativa.In this work,we employed the Position Specific Scoring Matrix(PSSM)to construct the matrix of protein sequence to gain the evolutionary information of amino acid sequences.(2)Feature extraction from protein sequence is one of the main components of the prediction model.For better predicting the potential plant protein pairs,we performed three feature extraction algorithms to extract 400-dimensional feature vectors from PSSMs.These three approaches are Walsh-Hadamard transform(WHT),Dual tree complex wavelet transform(DTCWT)and Inverse Fast Fourier transform(IFFT).(3)To better utilize the proposed three feature descriptors,we combined them with the ensemble learning based algorithms,Rotation Forest(Ro F)and Random Forest(RF).These feature descriptors were fed in the Ro F and RF classifier for training and predicting.In addition,to obtain better prediction results and prevent overfitting,the five-fold crossvalidation framework was employed in our method.(4)To better demonstrate the superiority of the proposed method,we compared it with Deep neural networks(DNN),K-nearest neighbors(KNN)and Support vector machine(SVM).The comprehensive results indicated that our method has a better prediction ability.In further work,we hope the proposed method can be a useful tool in the proteomics research.
Keywords/Search Tags:plant, protein-protein interaction, Walsh-Hadamard Transform, Dual tree complex wavelet transform, Inverse Fast Fourier transform, rotation forest, random forest
PDF Full Text Request
Related items