Font Size: a A A

Bioinformatic Analysis Of Functional Post-translational Modificaions In Lung Cancer

Posted on:2021-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Q ZhouFull Text:PDF
GTID:1484306107457214Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Post-translational modifications(PTMs)play important roles in regulation of many cellular signaling events.Most PTMs can change the physical and chemical properties by adding a variety of small molecules on proteins,to activate or inactivate them.Up to date,hundreds of PTM types have been reported.As most widely spread PTM types in varieties of cellular processes,protein phosphorylation and ubiquitination are tightly related to many diseases,especially cancers.In last decades,with the development of high-throughput sequencing and mass-spectrographic techniques,a blowout of cancer-related multi-omics data including PTM resources has been brought out.The following question is,how to integrate,analyze these data and exploit valuable PTM events in cancers.Consequently,this article focuses on lung cancer,which is one of the most malignant carcinomas,to figure out functional PTM events for its occurrence and development.Due to the deficiency of comprehensive public resource of ubiquitin and ubiquitin-like(UB/UBL)conjugation systems,we first constructed i UUCD 2.0 database of UB/UBL conjugations.In this work,we first searched the Pub Med with multiple keywords,then implemented a genome-wide identification through Hidden Markov Model(HMM)and orthologous search.As a result,i UUCD 2.0 contains 136,512 UB/UBL regulators,including1,230 E1s,5,636 E2s,93,343 E3s,9,548 DUBs,30,173 UBD proteins and 11,099 ULD proteins,which are classified into 74 families in 148 eukaryotes.Specially,detailed annotations for all these regulators in 8 common model species,such as Homo sapiens,Mus musculus,Rattus norvegicus,Drosophila melanogaster,Caenorhabditis elegans,Arabidopsis thaliana,Saccharomyces cerevisiae and Schizosaccharomyces pombe,were integrated from additional 68 public databases in 11 aspects as follows:(i)Cancer Mutation,(ii)Single Nucleotide Polymorphism(SNP),(iii)m RNA Expression,(iv)DNA&RNA Element,(v)Protein-protein Interaction,(vi)Protein 3D Structure,(vii)Disease-associated Variation,(viii)Drug-target Relation,(ix)PTMs,(x)DNA Methylation and(xi)Protein Expression/Proteomics.We anticipate that i UUCD 2.0 can be a more useful resource for further study of UB/UBL conjugations.As described above,multi-layer annotations of PTM regulators have been brought out,among which many annotations are closely related to tumorigenesis,such as cancer mutations,m RNA expressions,PTMs,DNA methylations and proteomics.In order to make full use of these multi-layer information to unearth the potential cancer drug targets,the cancer genome atlas(TCGA)was introduce to explore the driver kinases of lung cancer.In this work,somatic mutation data from 567 samples,copy number variation data from 561samples,m RNA expression data from 533 tumor and 59 normal samples,27K DNA methylation data from 127 tumor and 24 normal samples,as well as 450K DNA methylation data from 491 tumor and 32 normal samples were collected and implemented by differential analysis.On the basis of results of differential analysises between tumor and normal samples in lung cancer,a logistic regression model to evaluate driver kinase gene in lung cancer was constructed.As a result,we predicted 36 potential lung cancer driver kinases by applying the established machine learning model,and successfully verified the function of these candidates in K-rasG12D driven lung cancer mice model.After the construction of lung cancer driver kinases prediction algorithms by genome and transcriptome integration,some further efforts to discover vital PTM events in tumorigenesis via proteomics and phosphoproteomics were made.Thus,mass spectrometry-based phosphoproteomics data collected from lung cancer patients and normal tissues in two public proteomic resources including PRoteomics IDEntifications database(PRIDE)and Clinical Proteomic Tumor Analysis Consortium(CPTAC)were introduced to develop an integrated multi-sample and trans-engine platform of phosphoproteomics identification.In this platform,original phosphoproteomics data of 232 tumor samples and102 normal samples were re-expected by 9 different common peptide search engines.Then,a logistic regression model was constructed to evaluate each identified phosphorylation site according to the re-searching results and corresponding scores.Under the control of false discovery rate<1%,total 155,711 non-redundant phosphorylation sites were identified by integrated identification platform,which showed a 5%?265%increase than single-engine identification.Meanwhile,an average 1.87-fold increase than single-sample identification was also obtained.Later,the integrated identification platform as well as corresponding scoring system were used to filter the differential phosphorylation sites and phosphoproteins between lung cancer and normal sample.As a consequence,183 phosphorylation sites belonged to 169 phosphoproteins which expressed significantly different between tumor and normal sample were detected.Furthermore,these differential phosphorylation sites were reranked though the comparison of integrated identification platform and the most popular proteome analysis tool Max Quant.Finally,30 top phosphoproteins were retained and validated in A549 cell lines to verify their ulterior functions.In this case,18 differential phosphoproteins were proved to have great effects on tumor cell proliferation.In a word,by means of resolve the key technical issues on identification and integration of multiple samples and several engines,new integrated platform improved the throughput and accuracy of phosphoproteomics identification.At the same time,potential functional PTM events in lung cancer were explored by application of the integrated identification platform,thus provides a promising referring to lung cancer-related phosphorylation mechanism research.In summary,this work mainly focuses on the functional PTMs and their relationships with lung cancer.First,we constructed an integrated annotation for ubiquitin and ubiquitin-like conjugation database,i UUCD 2.0.Then,based on the contemplation of multi-layer annotations of PTM regulators,we developed the prediction algorithm of lung cancer driver kinases by using genome,transcriptome and epigenome data.Finally,in order to make a further discovery of important functional PTM events in lung cancer by combining the additional proteomics and phosphoproteomics,we established an integrated multi-sample and trans-engine platform of phosphoproteomics identification.All in all,this work provides a new strategy for multi-omics analysis of lung cancer and the study of its key functional PTM events,in terms of identification of PTM sites,molecular mechanism and its regulation.
Keywords/Search Tags:Functional post-translational modifications, Multi-omic analysis, Logistic regression, Bioinformatics, Lung cancer, Multiple tools integration
PDF Full Text Request
Related items