Font Size: a A A

On The Prediction Of Protein Function Based On Data Mining

Posted on:2014-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:C L SunFull Text:PDF
GTID:1260330401975986Subject:Bioinformatics and systems biology
Abstract/Summary:PDF Full Text Request
With the coming of the post-genomic era, the proteomics that focuses oninvestigating proteins’ function has attracted a lot of attention in life science.Proteins play a critical role in life and are responsible for some very importantfunctions, such as organs constitution, the catalysis of biochemical reactions, thereception and transmission of cell signaling, the maintenance of cellenvironment, etc. However, the function of many proteins is unknown. Forexample, about half of the proteins remain uncharacterized for human being. Itis expensive and time-consuming to characterize proteins’ function withtraditional experimental techniques in lab. Computational biology provides analternative way to predicting protein function using high-throughput proteomicsdata. In our study, we utilize data mining to predict protein function based onprotein primary structures, protein-protein interactions and protein expressiondata, etc. Especially, I focus on the following topics.1) We constructed a novel model, namely FGsub, to predict the proteinsubcellular localizations for the fungal pathogen Fusariumgraminearum (telomorph Gibberella zeae). All fungi protein subcellularlocalizations annotations were collected and integrated into a database.On the one hand, we designed an ensemble classifier to predict proteinsubcellular localizations, where the Support Vector Machine (SVM) wasemployed as learner based on diverse feature descriptions. On the otherhand, BLAST is further utilized to transfer annotations of homologousproteins to uncharacterized F. graminearum proteins so that the F.graminearum proteins are annotated more comprehensively.Furthermore, we present a new algorithm to cope with the imbalanceproblem that arises in protein subcellular localization prediction, whichcan solve imbalance problem and avoid false positive results. The highaccurate predictions from FGsub can help one better understand F.graminearum proteins’function and provide insights into the pathogenicmechanisms of this destructive pathogen fungus.2) A new model was developed to predict protein S-glutathionylation sites.First, we collected experimentally determined S-glutathionylatedproteins and constructed a protein S-glutathionylation database by textmining. Then, we proposed a new method for predictingS-glutathionylation sites by employing machine learning methods basedon protein sequence data. The model could predict proteinS-glutathionylation sites effectively and help to uncover the mechanismsof protein S-glutathionylation 3) A novel probability model was proposed to construct the proteinphosphorylation network. Firstly, we integrated protein phosphorylationexpression data and protein-protein interaction data and scanned all theexpressed proteins for phosphorylation motifs. Then, we calculated theprobability of motifs interact with kinase and the probability of proteinsubstrates catalyzed phosphorylation by kinase and predicted thekinase-substrate relations. Finally, we constructed human tissue specificprotein phosphorylation networks by combining protein tissue specificexpression data. Network function enrichment analysis demonstratedthat the three tissue specific phosphorylation networks were functionallyconsistent with the corresponding tissue, respectively.
Keywords/Search Tags:data mining, protein function, protein subcellular localizations, S-Glutathionylation, phosphorylation
PDF Full Text Request
Related items