Font Size: a A A

Prediction Of Protein Post-translational Modification Sites Based On Ensemble Learning And Flexible Neural Tree

Posted on:2018-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:R Z HanFull Text:PDF
GTID:2310330512981824Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein post-translational modifications play a crucial role during the eukaryotic cells' life.Through the interaction and cooperation among a variety of protein post-translational modifications,several of cellular functions can be maintained and promoted.However,the identification of post-translational modification types needs to multifarious time-consuming experimental work and is with lower efficiency.It is very urgent to develop the effective tools for identifying the modified sites by bioinformatics methods.Based on the sequence of proteins and the combination of various feature extraction methods,this paper predicts the protein phosphorylation and phosphoglycerylation sites by using intelligent calculation methods.For the phosphorylation modification,from the standpoint of the modified function,some special protein sequences are extracted from the database of phosphorylation modification,which is related to the signal transduction function.Meanwhile,a new feature extraction method is proposed,in which the information of amino acid residues' group will be integrated into the feature extraction based on the amino acid frequency.After the fusion of the new grouping information,the simulation results illustrate that the prediction accuracy can be noticeable improved for the modification sites with the same type.For example,by using the feed-forward neural network optimized by particle swarm optimization,the prediction accuracy is improved from 58% to about 86%.On this basis,some preliminary experiments have been testified about the effect of different sizes of the amino acid sequence.The simulation results show that the predicted accuracy are the best while the protein sequence contained 23 amino acid residues.After that,the data sets are rearranged in accordance with 10-flod cross validation,three models for ensemble learning(neural network,support vector machine and flexible neural tree)are undertaken by using the new feature extraction method,in which the combination of the three models is carried out according to the principle of the minority.The experimental results show that the prediction accuracy can reach 87.50% after the ensemble learning of the three prediction models,which is higher than that of the previous research results.For the phosphoglycerylation modification,flexible neural tree is applied to the research of predicting phosphoglycerylation sites and its simulation results are compared with the latest research achievements in this research field.In addition,the data sets are processed by 10-flod cross validation and the window values of the protein sequences are drawn from the conclusions of previous researchers.The experimental results show that the prediction accuracy can reach more than 90%,far higher than the experimental results of the first published by the previous researchers,and it is about 2% higher than that of the latest research works.The Matthew correlation coefficient of the prediction results is 0.807.With the increase of the proportion of negative samples,the Matthew correlation coefficient decreases gradually,and the minimum value is 0.326.It can be concluded that the imbalance of positive and negative sample data has a great influence on the experimental results.In this paper,the prediction of protein phosphorylation sites was studied by using a variety of prediction models in the new feature extraction method,and the integrated model performed well.Meanwhile,the method of predicting the site of phosphoglycerylation was studied by using the flexible neural tree model.Compared with the latest research results,the model greatly improved the prediction performance.
Keywords/Search Tags:ensemble learning, flexible neural tree, protein post-translational modification, sites prediction
PDF Full Text Request
Related items