Font Size: a A A

Prediction Of Neuropeptide Precursor And Its Cleavage Site Based On Machine Learning

Posted on:2022-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2480306524982389Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Neuropeptides are a class of bioactive peptides with about 5 to 50 amino acids in length.They are ubiquitous in the central and peripheral nervous system,and play crucial parts in the activation of signal cascades in reproductive,metabolic,sensory,memory,learning and other life activities.Neuropeptides are derived from neuropeptide precursor proteins,which are directly translated from m RNAs and usually consists of a signal peptide,one or several neuropeptide sequences and some other sequences.After proteolysis and a series of post-translational modifications of the neuropeptide precursors,onr or more mature neuropeptides are produced.Under the background of exponential growth of function-unknown protein sequences and limited type of known neuropeptides,accurate identification of neuropeptide precursor sequences and their cleavage sites is significant for the development of neuroscience,especially for neuropeptide research.However,the existing research methods mainly rely on experiment,such as site-directed mutagenesis and aminal experiments,which are time-consuming and laborious,and sometimes unsatisfactory because the accuracy is relatively low.With the rapid development of bioinformatics,more and more computational methods have been widely used in life science research,including protein structure modeling,RNA-RNA interaction,drug design and many other fields,and neuropeptide research is of no exception.Support vector machine(SVM),random forest and some other machine learning methods were applied to do the following two work: first,a SVM model based on pseudo amino aicd composition was constructed to predict neuropeptide precursor sequences.The dataset which is collected from an published article including 405 nuropeptide precursors(as positive data)and 405 non non-neuropeptide precursors with the same length distribution as neuropeptide precursor sequencs(as negative data).The prediction accuracy of this model reached 87.14%,and AUC was 0.9391.Second,using SVM,random forest,K-nearest neighbors,neural network and other machine learning methods,we constructed several models based on different featurization methods relate to amino acid sequence composition,distribution and physicochemical properties.The original data source is the neuropeptide precursor sequences in the previous work.A series of futher data processing was carried out according to its annotation in Uni Prot.Then model construction and prediction were implemented for the obtained 937 positive data and the randomly selected 937 negative data with the same sequence length.The model with best performance was the one based on support vector machine with enhanced amino acid composition features,with accuracy of 90.37% and AUC of0.9576.So we developed a predictive tool called Neuro CS for this model.For the convenience of use,the tool provides free online service:http://i.uestc.edu.cn/Neuro CS/dist/index.html#/...
Keywords/Search Tags:neuropeptides, neuropeptide precursors, cleavage sites, machine learning, support vector machine
PDF Full Text Request
Related items