Font Size: a A A

Research On Prediction Of Anticancer Peptides Based On Sequence Information

Posted on:2022-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:X LiangFull Text:PDF
GTID:2491306515456434Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Anticancer peptides(ACPs)are bioactive peptides with anticancer activity,which have no damage to healthy human cells due to their inhibitory effects on tumor cell membrane.ACPs have become a hot spot in the research on new anti-tumor drugs.At present,many anticancer peptides are in clinical trials.Thus,the prediction of anticancer peptides has a good application prospect for the treatment of tumors.Conventional anticancer peptide prediction relies on biological experimental techniques.Although these techniques have high accuracy of ACPs detection,they are costly,labour-intensive.With the advancement of the genome project and the development of molecular sequencing technology,protein sequence data is increasing exponentially every year.Thus,it is a big challenge for biological experimental techniques to handle with massive sequence data.The computational method of identifying ACPs from large-scale sequences is highly efficient and can quickly identify highly credible ACPs.They have become a supplementary method for biological experimental techniques.In this thesis,novel machine learning-based computational methods of ACPs identification are proposed to handle with on massive sequence data.The major works are as follows:(1)The feature engineering for prediction of ACPsIn recent years,many machine learning based methods for prediction of ACPs have been proposed.However,most of the existing methods lack the comprehensive evaluation of sequence features for ACPs prediction.In this thesis,a total of 23 sequence features and physicochemical features suitable for the prediction of ACPs are extracted,and the comprehensive evaluation is carried out through ten-fold cross-validation tests.Five features are selected: Amino Acid Composition(AAC),Pseudo-Amino Acid Composition(PAAC),Distribution(CTDD),Composition of k-spaced Amino Acid Pairs(CKSAAP)and Quasi-Sequence Order(QSOrder).Five selected features were used to encode the input ACPs sequences,and the Z-score method was employed to standardize these features,a two-step feature selection algorithm based on F-score algorithm and sequence forward search algorithm is implemented to select suboptimal feature subset form the original feature set.The above works provide a solid basis for improving the performance of ACPs prediction.(2)Construction of ensemble learning-based model of ACPs predictionExisting predictors greatly facilitate the identification of ACPs.However,the following issues for most of these methods need to be addressed.For example,their performance and generalization ability needs to be improved.Moreover,the machine learning-based methods only apply individual algorithms to train the model,and the performance is not robust in some cases.Based on the obove study on the feature engineering,an ACPs prediction method based on Stacking ensemble learning,termed ACPred Stack L,was proposed.The method consists of two layers of classification model.The first layer uses k-nearest neighbor algorithm,Gaussian NB,Light GBM and support vector machine as the base classifier.The second layer uses logistic regression classifier.High predictive performance of ACPred Stack L was obtained by ten-fold cross-validation and independent test on the benchmark data set,and by five-fold cross-validation,leave-one-out cross-validation and independent test on other ACPs data sets,respectively.Since ACPred Stack L is a black box prediction model.In this study,SHAP algorithm is used to illustrate the impact of selected features on the model output.In order to facilitate researchers to use this model,an online prediction tool is developed in this work,so that researchers use the tool to support their research on the prediction of ACPs.(3)Prediction of ACPs from different functional peptidesDifferent functional peptide sequences have a high similarity.Thus,it poses a great challenge to predict ACPs from other peptide sequences.Most of existing predictors do not consider to identify ACPs from the various functions of other active peptide.This work construct one dataset consisting of ACPs,anti-bacterial peptides,cell penetrating peptides,anti-inflammatory peptides,anti-angiogenic peptides,anti-viral peptides,and surface-binding peptides.In this dataset,ACPs sequences were regarded as positive cases,while other peptide sequences were negative cases.To find a better prediction model of ACPs from the various functions of other active peptides,this work constructs the nine kinds of common deep learning framework on the different features combinations,and select better feature combinations suitable for deep learning framework.Moreover,Bayesian optimisation and attention mechanism are used to improve the performance of the deep learning framework.Finally,the performance of machine learning algorithm,ensemble learning algorithm and deep learning framework is comprehensively evaluated.The experimental results show that the AUC,MCC and Acc of CNN_LSTM(attention)are better than other models.
Keywords/Search Tags:Anticancer peptide, Ensemble learning, Deep learning, Attentional mechanism
PDF Full Text Request
Related items