Font Size: a A A

Linear B Cell Epitope Prediction Based On Artificial Intelligence

Posted on:2024-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2530307106465984Subject:Biomathematics
Abstract/Summary:PDF Full Text Request
The linear B cell epitope(BCE)is a biologically active peptide that is widely used in vaccine design,immunodiagnostic testing,antibody preparation and disease prevention and control,so it is important to accurately identify the linear BCE.Experimental-based linear BCE detection is expensive and time-consuming,and no longer meets the needs of modern,massive protein sequence data identification.In contrast,computational methods,especially those based on artificial intelligence,can effectively identify linear BCEs at a lower cost,and although several computational methods are available,their performance remains unsatisfactory.In this thesis a linear BCE prediction method LBCE-XGB based on extreme gradient boosting algorithm is proposed.In order to represent the hidden biological information in the peptide sequences,this study pre-trained an epitope-specific BERT model to extract the embedding vectors of residues,in addition to five other types of sequence features including amino acid composition and amino acid antigenicity scales.The AUROC for the five-fold cross-validation of LBCE-XGB was 0.845,outperforming the other developed models and achieving the best performance.Results on an independent test set show that this model has an AUROC of 0.838,which is also significantly higher than other state-of-the-art methods.The results of this project show that BERT embedding can be used as an effective feature to predict linear BCE,and LBCE-XGB can be used as a low-cost and high-precision linear BCE detection method.In addition,in order to further verify the ability of BERT embedding feature to represent bioactive peptides,the prediction of neuropeptides was further studied based on the linear BCE prediction work.Neuropeptides play an important role in different physiological processes and are associated with a wide range of diseases.The identification of neuropeptides can be of great help in the study of these physiological processes and in the treatment of neurological diseases.In this work a new neuropeptide prediction model Neuro Ppred-SVM is proposed,using 14 machine learning,deep learning models based on seven feature extraction methods to build the model separately,and finally support vector machines are selected to build the model to predict neuropeptides based on BERT embedding and other sequence features.The experimental results show that the model achieves an AUROC of 0.969 and 0.966 on the training and independent test sets,respectively.comparison with four recent models on the independent test set shows that Neuro Ppred-SVM performs best,proving that the model outperforms existing models.The results of this study suggest that a framework based on similar work on linear BCE prediction can also achieve good results in neuropeptide prediction,and Neuro Ppred-SVM is expected to be a high-accuracy and low-cost tool for neuropeptide identification.
Keywords/Search Tags:Linear B cell epitope, BERT, Machine learning, Natural language processing, neuropeptide
PDF Full Text Request
Related items