The Research Of Protein Post-translational Modifications And Protein-protein Interactions Prediction Methods

Posted on:2014-09-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X W Zhao

Full Text:PDF

GTID:1260330401478910

Subject:Cell biology

Abstract/Summary:

As basics of proteinâ€™s normal biological function, post-translational modifications andprotein-protein interactions play a very important role in the life body. Due to the poorexperimental methods and the lack of sufficient data for analyses, although more than350kinds of protein post-translational modifications have been discovered, only a few of themhave been well-characterized. Conventional experimental identification of proteinpost-translational modifications sites is laborious and expensive, and the optimization ofenzymatic reaction is also a very time consuming process, these factors severely limit thedevelopment speed of the related researches. Therefore, some computational methods havebeen proposed and applied with varying success. These methods not only can efficiently,accurately predict protein post-translational modification sites, but also can provide someclues for further in vivo or in vitro confirmation. The research of protein-protein interactionswill help related researchers in-depth understand of various biological processes from thesystem point, meanwhile, it could provide a reliable data source for further exploring themechanism of zoonotic diseases, and point out the direction of new drug research anddevelopment. In this paper, we do some researches on protein post-translation modificationsites and protein-protein interactions. The main results can be summarized as follows:(1) We propose an ensemble computational method to predict lysine ubiquitylation sites.Firstly, four kinds of useful features are used to describe each amino acid of lysine site and itssurrounding sites. Secondly, in order to reduce the computational complexity and enhance theoverall accuracy of the predictor, an effective feature selection method is used to select someoptimal feature subsets. Finally, the ensemble classifier is established using the optimalfeature subsets as input, and compared with the other predictors. Experimental results haveshown that our method is very promising to predict lysine ubiquitylation sites.(2) Based on the effective pupylation substrate information, we construct a novelpredictor to predict the pupylation sites. Firstly, we extract five kinds of features for eachprotein sequence in the training dataset and use these features to encode each amino acid ofpupylation site and its surrounding sites. Then, the maximum relevance minimum redundancy(mRMR) and incremental feature selection (IFS) methods are made on the feature set to selectthe optimal feature subset. Finally, the predictor model is built based on the optimal featuresubset with the assistant of nearest neighbor algorithm (NNA), and the accuracy is70.93%bythe jackknife cross-validation. Through the biological analysis of the optimal feature subset,we find that evolutionary information and physicochemical/biochemical properties play important role in the recognition of pupylation sites, and sites7,10and11contribute the mostto the determination of pupylation sites. The experimental results indicate that thecombination of mRMR and IFS could effectively select the optimal feature subset of thebiological datasets. We can obtain satisfactory prediction performance and find the biologysignification of the selected features using the model constructed on the optimal featuresubset.(3) The composition of k-spaced amino acid pairs (CKSAAP) is first used to predictprotein phosphorylation sites, and enhanced the prediction accuracy of phosphorylation sites.When benchmarked against PPRED, DISPHOS and NetPhos, the performance ofCKSAAP_PhSite is measured with a sensitivity of84.815%, a specificity of86.07%ï¼Œand anaccuracy of85.43%for serine, a sensitivity of78.59%, a specificity of82.26%and anaccuracy of80.31%for threonine as well as a sensitivity of74.44%, a specificity of78.03%and an accuracy of76.21%for tyrosine. Experimental results indicate that the proposedapproach is effective and practical. Based on the model of predicting protein phosphorylationsites, a corresponding online web server is established.(4) We propose a new augmented Chouâ€™s pseudo amino acid composition to predictprotein-protein interactions. Firstly, three groups of descriptors are used to encode eachinteractive pair. As a result, each interactive pair is represented by930features. Then theprincipal component analysis (PCA) is utilized for dimensionality reduction. The resultingfeature subset contains few features, meanwhile, retains as much information of the whole setas possible. Finally, a protein-protein interaction prediction model is established based on theresulting feature subset, and compared with the other predictors on the Drosophilamelanogaster and the Helicobater pylori datasets. Experimental results have shown that ourmethod is very promising to predict protein-protein interactions.

Keywords/Search Tags:

Machine Learning, Protein Post-translational Modifications, Protein-proteinInteractions, Feature Encoding, Cross-validation test, Ubiquitylation, Pupylation, Phosphorylation

Related items

1	Research On Prediction Of Pupylation And Ubiquitylation Proteins And Their Modification Sites Based On Machine Learning
2	Research On Machine Learning Based Protein Post-Translational Modification Site Predictions
3	Method Development For The Prediction Of Two Types Of Lysine Post-translational Modification Sites Based On Sequence Information
4	Artificial Intelligence Biology Study On Prediction Of Protein Post-translational Modifications And Functions
5	Effects Of Cysteine On Myoglobin And Neuroglobin In Post Translational Modifications
6	Prediction And Functional Analysis For Protein Post-translational Modification Sites
7	Prediction Of Post-translational Modification Cross-talk Within Proteins Using Residue-and Residue Pair-based Features
8	Research On Prediction Model Of Protein Post-translational Modification Cross-talk
9	A General Computer Predicted Method For Modified Sites In Protein
10	Protein Post-translational Modifications And Cross-talks Between Different Modifications