Font Size: a A A

Protein-Protein Interaction:Simple Prediction Tool Development And Studies On Specific Cases

Posted on:2016-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:1220330467996467Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Deciphering interactions between proteins is one of the great challenges in current biology. Therefore, computational prediction and analysis of protein-protein interactions (PPIs) have become a hot topic in the field of Bioinformatics in recent years. In this study, the author first constructs a general PPI predictor based on the non-random codon pair usage among interacting protein pairs. Then the author focuses on the study of two more detailed but important cases:ubiquitination sites and microtubule associated proteins (MAPs). Ubiquitination site is the site where a substrate protein interacts with ubiquitination enzymes and eventually modified by them. Due to the complexity of the ubiquitination system, features that determine the specificity of ubiquitination sites are not clearly understood. In the first case study, the author performs statistical analyses to summary the structural propensities of human ubiquitination sites. MAPs, by definition, are the interaction partners of a microtubule. Because the microtubule is a highly dynamic protein complex, its interactions with MAPs are hard to be covered by binary protein interactome. Therefore, in the second case study, a novel online MAP analysis tool is established based on a manually curated high-quality MAP dataset.Recently, simple sequence-based homology-free encoding schemes have been increasingly applied to develop PPI predictors by means of machine learning methods. Preliminary analysis shows that codon pair usage of interacting protein pairs in yeast differs significantly from that of the random protein pairs. This motivates the development of a novel approach for predicting PPIs, with codon pair frequency difference as the input to a support vector machine (SVM) classifier, termed as CCPPI. Preliminary ten-fold cross-validation tests based on yeast PPI datasets with balanced positive-to-negative ratios indicate that CCPPI performs better than other sequence-based encoding schemes. When tested on a more rigorous unbalanced large-scale dataset, it ranks the best with at least comparable performance to the other related PPI prediction methods. Statistical analyses of the predicted true positives confirm that the effectiveness of CCPPI is partly ascribed to its capability to capture proteomic co-expression and functional similarities between interacting protein pairs. On the other hand, like other related PPI predictors, CCPPI is subjected to high false positive rates. Nevertheless, further comparison with homology-dependent PPI predictors suggests that CCPPI are complementary to the predictors based on the conservation or phylogenetic profile correlation of interacting protein pairs. Therefore, CCPPI is able to serve as a promising alternative PPI predictor when the homology-dependent methods fail. CCPPI has been made freely available at: http://protein.cau.edu.cn/ccppi.The existence and function of most proteins in the human proteome are regulated by the ubiquitination process. To date, tens of thousands human ubiquitination sites have been identified from high-throughput proteomic studies. However, the mechanism of ubiquitination site selection remains elusive because of the complicated sequence pattern flanking the ubiquitination sites. The author performs a systematic analysis of1330high-confidence ubiquitination sites in505protein structures and quantifies the significantly high accessibility and unexpectedly high centrality of human ubiquitination sites. Further analysis suggests that the higher centrality of ubiquitination sites is associated with the multiple functional associations of ubiquitination sites, among which protein-protein interfaces are common targets of ubiquitination. Moreover, the author demonstrates that ubiquitination sites are flanked by residues with non-random local conformation, and surrounded by a non-random set of amino acid residues in three dimensional protein structures. Finally, the author provides quantitative and unambiguous evidence that most of the structural propensities contain specific information about ubiquitination site selection, and such information is complementary to the sequence pattern. Therefore, the possibility of additional structural level of the ubiquitination site selection mechanism has been substantially suggested.The microtubule is one of the major components of eukaryotic cytoskeleton. Its functional impacts, including but not limited to the regulation of cell morphogenesis, cell division, intracellular trafficking and cell signaling, should be unleashed and controlled by a series of MAPs. Specialists in this field are aware of the diversity of known MAPs and propel the identifications of new types of MAPs. By contrast, there is neither specific database to record the known MAPs, nor MAP predictor that can facilitate the discovery of potential MAPs. The author reports the establishment of a MAP-centered online analysis tool MAPanalyzer, which is consist of a MAP database and a MAP predictor. In the database, a core MAP dataset, which is fully manually curated from the literature, is supplied by MAP information collected via automated pipeline. This dataset, on the other hand, enables the summary of representative motifs of known MAPs. A semi-supervised SVM classifier exploiting these motifs is combined with BLAST homolog searching to build the MAP predictor. Benchmarks on a high-quality independent dataset and the Arabidopsis thaliana whole genome dataset have shown that the proposed predictor outperforms not only its own component (i.e. the SVM classifier and BLAST), but also another homolog searching tool popular in the field, i.e. PSI-BLAST. Similar to CCPPI, MAPanalyzer is also freely available at http://systbio.cau.edu.cn/mappred/.
Keywords/Search Tags:Protein-protein interaction, Machine learning, Ubiquitination, Microtubule associatedprotein, Feature extraction
PDF Full Text Request
Related items