Font Size: a A A

Method Development For The Prediction Of Two Types Of Lysine Post-translational Modification Sites Based On Sequence Information

Posted on:2017-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Md.Mehedi HasanFull Text:PDF
GTID:1220330482492690Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Lysine is one of the most frequently occurred post-translational modification (PTM) sites in both of prokaryotic and eukaryotic cells. In order to understand the molecular mechanism of lysine modification, it is important to identify lysine modification sites and their modification degrees accurately. To date, several experimental methods have been developed to identify lysine modification sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential lysine PTM sites based on protein sequence information are highly desirable. In this thesis, the author focuses on the prediction of two types of lysine PTMs (pupylation and succinylation sites).Prokaryotic proteins are regulated by pupylation, a PTM type that contributes to cellular function in bacterial organisms. In the pupylation process, the prokaryotic ubiquitin-like protein (Pup) tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. In this thesis, the author performs a systematic analysis and developed a novel predictor termed as pbPUP for accurate prediction of pupylation sites.In particular, a sophisticated sequence encoding scheme [i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP)] is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM) classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in 10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The web server of pbPUP and curated datasets are freely available at http://protein.cau.edu.cn/pbPUP/.Lysine succinylation is an emerging protein PTM, which also plays an important role in regulating the cellular processes in both eukaryotic and prokaryotic cells. In this thesis, a novel computational tool termed SuccinSite has been developed to predict protein succinylation sites by incorporating three sequence encodings, i.e., the composition of k-spaced amino acid pairs, binary and amino acid index properties. Then, a random forest classifier is trained with these encodings to build the predictor. The SuccinSite predictor achieves an AUC score of 0.802 in the 5-fold cross-validation set and performs significantly better than existing predictors on a comprehensive independent test set. Furthermore, informative features and predominant rules (i.e. feature combinations) are extracted from the trained random forest model for an improved interpretation of the predictor. Finally, the author also compiles a database covering 4,411 experimentally verified succinylation proteins with 12,456 lysine succinylation sites. The web server, training and test datasets, source codes and database are freely available at http://systbio.cau.edu.cn/SuccinSite/.Taken together, the above results suggest the proposed predictors (pbPUP and SuccinSite) would be helpful computational resources for the prediction of lysine pupylation and succinylation sites.
Keywords/Search Tags:Post-translational modification, Machine learning, Support vector machine, Random forest, Pupylation, Prediction, Succinylation, Feature encoding
PDF Full Text Request
Related items