Font Size: a A A

Data Integration And Analysis Of Phosphoproteome

Posted on:2021-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:C W WangFull Text:PDF
GTID:1480306107458084Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
As one of the most improtant and most widely studied post-translational modifications,protein phosphorylation is involved in regulating almost all biological processes in eukaryotes.From a biochemical perspective,the phosphorylation reaction is catalyzed by protein kinases,which is achieved by covalently linking the phosphate group on the ATP to the proteins.Protein phosphorylation is reversible.The reversible process which also called dephosphorylation is catalyzed by protein phosphatase.The phosphorylation modification in eukaryotes mainly occurs on the residues of serine(S),threonine(T)and tyrosine(Y).Current studies suggest that ?30% of eukaryotic proteins could be phosphorylated,corresponding to the “massive” phosphorylation sites remain to be identified.Nowadays,mass spectrometry(MS)-based high-throughput phosphoproteomics is the most common strategy for large-scale phosphorylation sites identification,and over ten thousands phosphorylation sites could be identified through once detection.With the accumulation of phosphorylation sites,the main issues in this field include how to collect,integrate,and annotate the data systematically,how to identify the upstream regulated kinases efficiently,and how to determine the key phosphorylation regulation events in specific biological processes.Therefore,this work focused on protein phosphorylation that data integration and analysis is carried out for phosphosproteome.We first constructed a comprehensive annotation database of eukaryotic phosphorylation sites,EPSD.Through researching from literatures and integrating the databases,EPSD collected phosphorylation sites from 68 eukaryotes,with a total number of 1,616,804 sites from 209,326 phosphoproteins,and the source of each site was also remained.For phosphorylation sites with the information of location probability(LP),we classified the sites into class I to IV based on LP scores which represent the credibility of sites.In addition,in order to comprehensively annotate the phosphorylation sites and corresponding phosphoproteins,EPSD maps the information to the phosphorylation substrates of 8 model species from 15 aspects mainly based on databases integration and prediction from tools,.In total,the annotation information comes from 100 databases and tools.Until now,compared to other eukaryotic phosphorylation site databases,EPSD contains the largest number of sites and the most comprehensive annotations.On this basis,in order to efficiently identify the regulated kinases of phosphorylation substrates,we constructed a new kinase-specific phosphorylation site prediction tool,GPS 5.0,based on the GPS algorithms which developed by our laboratory.After data updation,the GPS 5.0 trainning data set contains 15,194 phosphorylation sites and their regulated kinases.Whereafter,with the logistic regression(LR)algorithm,the motif length selection(MLS)and matrix mutation(Ma M)methods in GPS 2.1 were updated to position weight determination(PWD)and scoring matrix optimization(SMO)methods,respectively,which contributed to the comparable or better performance of GPS 5.0 compared wih other tools.Moreover,GPS 5.0 has constructed additional predictors for dual-specificity kinases.Meanwhile,GPS 5.0 created the species-specific module for the prediction of 44,795 protein kinases in 161 species on the basis of the classic module.At last,the development of local software and online website made the GPS 5.0 convinent for experimenters.Then,to link the abnormal phosphorylation events of the non-periodic hepatocellular carcinoma(HCC)with its clinical treatment,we collected postoperative cancer tissues from 19 HCC patients with 4 adjacent tissues and 4 HCC cell lines.The lable-free(LB)quantitative phosphoproteomics were conducted with these samples.As a result,49,933 phosphorylation sites from 9,061 phosphoproteins were identified.Based on these data,we developed an integrative pipeline as the inference of druggable kinome,i Dru Kin.A total of 16 potential druggable kinases and their corresponding drugs for HCC treatment were predicted.These kinases include not only oncogenic kinases like MTOR and CDK4/6 and their corresponding inhibitors but also tumor suppressors like PKCs and AMPKs with their activators.These druggable kinases and drugs,take MTOR and its corresponding drug rapamycin for example,were confirmed to be effective in the treatment of HCC by subsequent cell,tissue,and mouse experiments.Moreover,by comparing the activities of druggable kinases among patients with different prognosis,it was found that the recurrent and non-recurrent HCC patients could be distinguished by these kinases,suggesting that the predicted kinome could be used as a prognostic marker for HCC.Meanwhile,we adopted the model organism Drosophila melanogaster to study periodic circadian.WT and per0 fly heads were collected in 2 days at 3 hr intervals under constant darkness condition.TMT-labeled quantitative phosphoproteomics,proteomics and transcriptomics were carried out.With the output data,we developed a pipeline for integrating circadian mut-omics data,iCMod.By using this pipeline,circadian m RNA,proteins and phosphorylation sites were identified at different levels.Compared to WT flies,the lack of per significantly interfered the circadian of fly head,eliminating most of the circadian oscillations.In addition,based on the normalized circadian phosphorylation sites(NCPs),iCMod predicted 27 potential circadian kinases,among which 7 circadian kinases have been confirmed already.Throuth subsequent verification by experiments,3 new circadian kinases were identified and 3 remains furethr verified.In summary,a series of researches on protein phosphorylation modification have been carried out.By collecting and annotating eukaryotic phosphorylation sites,a user-friendly database was constructed.Base on that,a kinase-specific phosphorylation site prediction tool was developed.In addition,corresponding analysis pipelines have been developed for the phosphoproteomes about non-periodic and periodic biological processes.
Keywords/Search Tags:Protein phosphorylation, Phosphoproteomic, database, Prediction of phosphorylation sites, Hepatocellular carcinoma(HCC), Circadian
PDF Full Text Request
Related items