Font Size: a A A

Predicting Carbonylation Sites Based On Machine Learning Methods

Posted on:2022-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhangFull Text:PDF
GTID:2480306524482434Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Protein carbonylation(PCO)is one of the most important non enzymatic and oxidative stress-induced post-translational modifications(PTMs),which is generally characterized as stability,irreversibility and relative early formation.Many studies have shown that protein carbonylation could cause irreversible changes in protein structure,which would lead to the loss of original biological functions,resulting in the disorder of cell and tissue functions,the decrease of cell viability and even cell death.Moreover,it has been demonstrated to be closely related to cell apoptosis,aging and the mechanism of a variety of chronic diseases and neurodegenerative diseases.Therefore,protein carbonylation is regarded as a biomarker of oxidative stress,and its role in cell,tissue and organ senescence has attracted special attention.It is costly and time-consuming to identify carbonylation sites by biochemical experimental techniques.Additionally,protein carbonylation is a highly dynamic process with many modification forms.Thus,if we only rely on the traditional biochemical detection technology to identify protein carbonylation sites,the research progress of carbonylation site recognition will be seriously limited.At present,with the improvement of high-throughput mass spectrometry technology,a large number of protein carbonylation site data have been generated,which provides data foundation for our systematic research on protein carbonylation.Therefore,it is necessary to develop computational method to identify carbonylation modification sites fast and precisely.It can not only provide key clues for the occurrence and development of diseases,but also provide powerful tools for biological experimental research.In this study,first of all,the experimental data of protein carbonylation from published literatures were collected,and then a high-quality benchmark data set after strict screening was constructed.Subsequently,the position-specific sequence characteristics of amino acid residues around carbonylation sites and non carbonylation sites as well as the distribution of physical and chemical properties of amino acids were compared and statistically analyzed.Based on the above analysis,we proposed a novel feature coding scheme,that is,conical representation for characterizing intrinsic properties of protein sequences.Random forest(RF)classification algorithm was used to construct the prediction model.The performances were evaluated based on 10-fold cross-validation.In order to remove redundant features and build a more robust model,F-score method and increment feature selection(IFS)were adopted for feature selection.As a result,compared with the existing models,the 10-fold cross-validation results on the training set and the test results from the independent test set indicated that the proposed model has better prediction performance for protein carbonylation sites.Finally,for convenience of scholars,a user-friendly online webserver called i Car PS and a local software package were established based on the best prediction model,which be freely accessed at http://lin-group.cn/server/i Car PS/.
Keywords/Search Tags:protein carbonylation, sequence information, feature encoding, feature selection, machine learning
PDF Full Text Request
Related items