Font Size: a A A

Method Development Of Protein Functional Site Prediction Based On Sequence Information

Posted on:2015-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:1268330428960700Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Identification of protein functional sites is of great importance to further understand the biological function of protein molecules. In silico prediction of protein functional sites has become an important topic in the field of bioinformatics. In this thesis, the author focused on the prediction of two different protein functional sites (ubiquitination sites and zinc-binding sites). Firstly, according to the ubiquitina-tion site characteristics of yeast and human, the author developed two species-specific ubiquitination site prediction tools (CKSAAP_UbSite and hCKSAAP_UbSite). Then, the author conducted a compre-hensive evaluation on the existing ubiquitination site prediction tools based on four datasets from dif-ferent species. Finally, after the intensive feature analysis between zinc-binding sites and non zinc-binding sites, multiple prediction methods and features were integrated into a prediction tool named ZincExplorer.As one of the most important reversible protein post-translation modifications (PTMs), ubiquitina-tion has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. At first, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitina-tion sites from protein sequences in yeast. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs (CKSAAP) surrounding a query site (i.e. any lysine in a query sequence) as input. To facilitate the community’s research, a web server of CKSAAP_UbSite was constructed and is freely available at http://protein.cau.edu.cn/cksaap_ubsite/, which can be further used for proteome-wide ubiquitination site identification.Recent developments in the mass spectrometry (MS)-based proteomics have greatly expedited proteome-wide analysis of PTMs, more than ten thousands of ubiquitination sites in human were deter-mined. According to the complicated sequence context of human ubiquitination sites, the author devel-oped a novel human-specific ubiquitination site predictor through the integration of multiple comple-mentary classifiers. Firstly, a SVM classier was constructed based on the CKSAAP encoding, which has been utilized in our previous yeast ubiquitination site predictor. To further exploit the pattern and prop-erties of the ubiquitination sites and their flanking residues, three additional SVM classifiers were con-structed using the binary amino acid encoding, the AAindex physicochemical property encoding and the protein aggregation propensity encoding, respectively. Through an integration that relied on logistic re-gression, the resulting predictor termed hCKSAAP_UbSite achieved an area under ROC curve (AUC) of0.770in5-fold cross-validation test on a class-balanced training dataset. To facilitate the users, hCKSAPP_UbSite has been integrated into the existing CKSAAP_UbSite server.In the past several years, a few tools have been developed for the prediction of ubiquitination sites, but users are frequently confused by the differences in the prediction algorithms adopted and the select- ed features as well as the performance in different species. To address this problem, the author first compared and analyzed five popular standalone/web-server tools on four large sets from different spe-cies. Then, the author summarized the usage convenience of the tools under investigation in order to guide the users to choose the tools more efficiently. Finally, the author tested most of the features used in previous prediction tools and ranked them according to their performance to find out which features make a significant contribution in predicting ubiquitination sites for a specific species.As one of the most important trace elements within an organism, zinc has been shown to be in-volved in numerous biological processes and closely implicated in various diseases. The zinc ion is im-portant for proteins to perform their functional roles. Motivated by the biological importance of zinc, the author proposed a new method called ZincExplorer to predict zinc-binding sites from protein se-quences. ZincExplorer is a hybrid method that can accurately predict zinc-binding sites from protein sequences. It integrates the outputs of three different types of predictors, namely, SVM-, cluster-and template-based predictors. Four types of zinc-binding amino acids CHEDs (i.e. CYS, HIS, ASP and GLU) could be predicted using ZincExplorer. It achieved a high AURPC (Area Under Recall-Precision Curve) of0.851, and a precision of85.6%(specificity=98.4%, MCC=0.747) at the70.0%recall for the CHEDs on the5-fold cross-validation test. Moreover, ZincExplorer could also identify the interde-pendent relationships (IRs) of the predicted zinc-binding sites bound to the same zinc ion, which makes it a useful tool for providing in-depth zinc-binding site annotation. To facilitate the research community, the online web server of ZincExplorer was constrcuted, which is freely accessible at http://protein.cau.edu.cn/ZincExplorer/.
Keywords/Search Tags:Functional sites, Ubiquitination, Zinc-binding sites, Prediction, Machine learning, Feature selection
PDF Full Text Request
Related items