Font Size: a A A

Computational Identification Of Protein Ubiquitination Sites In Homo Sapiens Using Sequence Contextual Features

Posted on:2014-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:B HeFull Text:PDF
GTID:2250330422452883Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Protein ubiquitination is one of the important post-translational modifications (PTM), which playsimportant regulatory roles in many cell functions, such as protein degradation, signal transcription andDNA repair. Some researches suggest that there are some correlations between the exception ofprotein ubiquitination system and many diseases about human. Therefore, it can supply an importanttheoretical basis to investigational new drug and cure disease, if we understand deeply the proteinubiquitination system. In the process of protein ubiquitination, it is the key to attach ubiquitin to thespecific lysine residues in the target protein by forming isopeptide bonds. Here, we call this specificlysine residue as ubiquitination site. Therefore identification of ubiquitination sites is essential forcomprehensively understanding the regulation of the complex ubiquitination system and themolecular mechanisms of cell functions. In this thesis, the features influenced protein ubiquitinationwere studied based on the protein sequence in Homo sapiens. And our emphasis is placed onpredicting the protein ubiquitination site in Homo sapiens. The main research works and innovativeresults were summarized as follows:Firstly, based on the high throughput protein ubiquitination site in Homo sapiens, the sequencecontextual features centered on ubiquitinated lysine were illustrated, including amino acidcomposition, Entropy density profile (EDP), N-order coupled composition (N-OCC)of amino acids,the evolutionary information of amino acids, the physicochemical properties of amino acids, structureinformation and other post-translational modifications. Then a machine learning classifier, randomforest (RF), was employed as classifier for predicting the ubiquitination site in Homo sapiens. Thepredictive sensitivity, specificity and accuracy of our proposed model were63.6%,66.29%, and64.95%. The Mathew’s correlation coefficient(MCC)and balance error of our proposed model was0.299and0.355on prediction test set. Comparing with the existing methods, such as UbiPred,UbPred and CKSAAP, our proposed model has more advantages for identifying protein ubiquitinationsites in Homo sapiens.Secondly, the above results about predicting protein ubiquitination sites in Homo sapiens werediscussed. By comparing model performance among different window sizes, the optimal window sizewas set to23residues. The RF was selected as classifier in our study, compared with support vectormachine (SVM). Meanwhile, the correlations of ubiquitination with amino acid composition, disorderregion, phosphorylation and acetylation were analyzed. To understand whether there were different orthe same among ubiquitination in different kinds of protein, nucleus protein and cytoplasm protein were extracted from all human protein. The constructed models based on each kind of protein werecompared, revealing unconspicuous differences on the process of ubiquitination in different kinds ofprotein.Finally, a comprehensive database for protein ubiquitination sites in manmals——mUbiSiDa,and apredictor of human ubiquitination sites——PredHUbi were constructed based on our experimentaldata. The mUbiSiDa and PredHUbi were designed to be a widely used tool for biologists andbiomedical researchers with a user-friendly interface, and facilitate the further research of proteinubiquitination, biological networks and functional proteomics.
Keywords/Search Tags:protein in Homo sapiens, ubiquitination site, sequence contextual features, random forest, support vector machine
PDF Full Text Request
Related items