| Cancer is a leading cause of death in the worldwide.According to the estimate of International Agency for Research on Cancer(IARC),about 14.1 million new cancer cases and 8.2 million deaths occurred in 2012 globally.With increasing incidence and mortality,cancer has been the leading cause of death in China since 2010 and a major problem of public health.Because of massive population(1.37 billion),the number of tumor cases is huge(22%of global new cases and 27%of global mortality cases),which endangers the health of Chinese and bring an enormous burden on society.Thus,it is necessary to understand the underlying mechanism of cancer development and develop new medical treatment strategy.Researchers recently concluded the hallmarks of cancer:Self-Sufficiency in Growth Signals,Insensitivity to Antigrowth Signals,Resisting Cell Death,Limitless Replicative Potential,Sustained Angiogenesis,Tissue Invasion and Metastasis,Avoiding Immune Destruction and Deregulating Cellular Energetics.Genome instability and its consequential aberrations can help tumor cells gain multiple hallmarks of cancer.Rapid development of DNA sequencing technology provided us a better understanding of human genome and cancer genome.Researchers identified a large amount of somatic aberrations(mutations,copy number aberrations or fusions etc.)in cancer genomes,whereas only very few of them could bring evolution advantages for tumor cells,enable cells obtain the hallmarks of cancer and eventually facilitated tumorigenesis.The identification of these driving aberrations accelerated the development of cancer driver theory,in which somatic mutations were considered as major drivers of cancer.However,it still remained challenging to study driver genes:aberrations on known onco-driver genes can hardly explain the development of cancer in all patients because of its low frequency.Hypotheses of epi-driver theory were therefore proposed:there are a source of genes which are expressed aberrantly in cancers in a fashion that confers a selective growth advantage.However,no study has systematically investigated these genes although they were thought as important complements of mut-driver genes,leaving a blank for driver theory of cancer.According to the previous studies,tumorigenesis shared many features with gametogenesis:immortalization of primordial germ cell and transformation of tumor cell;meiosis of spermatogonia and aneuploidy in tumor cell;migration of primordial germ cell and metastasis of tumor cell.It was therefore concluded that some genes involved in the gametogenesis can be re-activated and consequently drive the development of cancer independently.Recently,increasing proteins were found to express only in germ cells,trophoblast cell and cancer cells,which were called as cancer-testis(CT)antigens.Increasing evidences suggested that these proteins were the basis of gametes recapitulation theory and participated in the process of tumorigenesis.Moreover,we proposed that they may be served as a source of important candidates of cancer epi-driver genes due to its typical expression pattern.Thus,systematically investigation on CT genes through the similarity between tumorigenesis and gametogenesis may help illuminate the mechanism of cancer development,complete cancer driver gene theory,bring new insight for cancer prevention and diagnosis,and provide candidates for molecular personalized medicine.In this study,we systematically identify cancer-testis genes by integrating multi-omics data and explore the driving role and potential activation mechanism of CT genes.Part I.Systematic identification of cancer-testis genesBecause of blood-testis barrier,testis has immune privilege and some testis-specific proteins are immunogenic.Thus,pioneer identification of CT genes were the byproduct of researches on cancer antigen,which used methodology based on antigen-antibody reaction.After the specific expression pattern of CT genes was observed,researchers began to identify CT genes using high-throughout platform.Till now,243 CT genes were included in the cancer-testis databases.However,these studies on CT genes were challenged by noisy signals and sample heterogeneity.Wide application of RNA-Seq method in large public available databases Genotype-Tissue Expression(GTEx)and The Cancer Genome Atlas(TCGA)tremendously diminished the noise and widened the expression profile.In addition,normal and tumor samples in these databases were collected with standard protocols,promising the sample consistence.In this study,we fisrt analyzed transcriptomics data of normal tissues(24 different organs from 175 individuals)from GTEx.We used specificity measure to classify all 50,016 genes.We also explored the potential activation mechanism of testis-specific genes in gametogenesis.Then,we defined cancer-testis genes by integrating expression profile of tumor tissues from 6,638 patients of 19 different cancers and explored their activation mechanism in cancer patients.We found that 8,565 genes can be defined as testis-specific expressed genes(TSGs)in GTEx data.Among these genes,1,336 could encode proteins and consistently showed testis-specific expression pattern in multiple databases.These genes were classified into C1 group.It was worthy of noting that 5,043 non-coding genes were also expressed specifically in testis.Enrichment analysis suggested that testis-specific promoters and de-methylation sites were frequently located proximally upstream(-100 bp to 1 kb)of C1 genes(ERpromoter=10.10,Fisher’s exact test Ppromoter=6.42×10-253;ERmethylation=5.37,Fisher’s exact test Pmethylation=6.57×10-12).In tumor tissues from TCGA project,1,019(77%)of the 1,336 Cl genes that were expressed in at least 1%of the samples(>5 normalized read counts)of any cancer type were defined as CT genes for the particular cancer.Based on these results,we built our search engine(CTatlas).In addition,we discarded 17 known cancer-testis genes from known CT database because they failed to show testis-specific expression pattern.To identify potential driver CT genes,we considered samples with extremely high expression(EE)as samples activated for the CT gene.CT genes identified in at least 1%of EE samples were defined as extremely highly expressed CT genes(EECTGs)for a particular cancer.In total,891 of the 1,019 identified CT genes passed this criteria in at least one cancer type;300 EECTPs(extremely highly expressed CT proteins:EECTGs with testis specific protein expression)were selected for further analysis.We found that C1 genes were approximately twice as likely to display this EE pattern in cancer samples compared with non-TSGs(ER=2.05,Fisher’s exact test P=2.21×10-22).Genes with EE patterns were very rare(Median:0-2)in each cancer sample and the number of activated EECTPs was inversely associated with the SMG mutation ratio after adjusting for cancer type(Beta=-4.58,Linear regression P=8.28×10-5),indicating that CT genes may exert a mutation-independent driving role.By integrating DNA methylation data from TCGA,we observed that the average promoter methylation levels of EECTPs were negatively associated with activated EECTP numbers(Beta=-30.97,Linear regression P=9.68×10-97)after adjusting for cancer type,suggesting that DNA methylation level in CT genes’ promoter may be responsible for the activation of CT genes in cancer.We also measured CT gene expression in 24 lung adenocarcinoma(LUAD)samples by RNA sequencing and successfully validated 19 EECTPs(7 novel).Two genes were proved to be significantly de-methylated in EE samples(RHOXF1 and VCX3B),suggesting that they might be re-activated through aberrant promoter methylation.Two other genes(LIN28B and MEIOB)were found to be regulated by their nearby testis-specific expressed non-coding genes respectively(LINC005777 and LINC00254).In sum,we described a comprehensive landscape of CT genes in the scale of whole genome and built new CT gene search engine based on the transcriptomics data from thousands of individuals.Integrating multi-omics data,we further explored their potential to be driver genes.We also provided evidence that EECTPs could be regulated by DNA methylation level in promoters and nearby testis-specific non-coding genes,suggesting the features of epi-driver genes.Generally speaking,our results provided a reliable database of CT genes,complemented the driver theory of cancer,brought new insight for cancer researches and provided a number of target candidates for molecular target therapy of tumor.Part II.LIN28B can drive the proliferation and metastasis of lung adenocarcinomaIn the first part of our study,we identified a new CT gene LIN28B.The homolog gene of LIN28B,LIN28A,is one of classical pluripotent genes,whose activation can induce the pluripotency of cells and play an important role in development and tumorigenesis.Recent studies have reported the driving role of LIN28B in neuroblastoma and hepatocellular carcinoma by regulating the expression of let-7.Thus,we proposed that LIN28B might be an epi-driver gene of lung adenocarcinoma.We conducted a serious of analysis and experiments to prove this hypothesis.We first proved that LIN28B could drive the proliferation and metastasis by in vitro and in vivo experiments and explored the downstream pathway of LIN28B through cell cycle assay,comet assay and bioinformatics analysis.Then we discussed the putative activation mechanism of LIN28B:we proved the driving role of LINC00577 through similar in vitro and in vivo experiments;we performed differential expression analysis to identify the upstream mutation of LIN28B;we performed re-assembly on RNA-Seq data of H1299 tumor cells to clarify the isoforms of LIN28B and constructed the connection between LIN28B isoforms and DNA methylation level of their promoters.We found that stable overexpression or knockout of LIN28B can significantly alter the proliferation and metastasis of LUAD cells.Mouse tumorigenicity assay suggested that LIN28B overexpression increased the occurrence of visible tumors compared with the control group.The mice injected with H1299-LIN28B+cells succumbed to death more rapidly than the mice injected with H1299-control cells,and the number of metastatic nodules on the surface of the liver was significantly higher in mice injected with H1299-LIN28B+ cells than in mice injected with H1299-control cells.These results proved the driving role of LIN28B.Next,we performed co-expression analysis and revealed that co-expressed genes of LIN28B were significantly enriched in cell cycle,DNA replication and damage repair,meiosis and fanconi anemia pathway.We observed that the overexpression of either LIN28B or LINC00577 resulted in a significantly reduced population of G2/M cells and increased the proliferation ability.Overexpression of LIN28B or LINC0057 significantly increased the amount of DNA strand breaks in cells treated with UV light,but had no effects in cells not treated with UV light.In addition,the signal of AICNA was significantly associated with LIN28B and LINC00577 activation(Wilcoxon’s rank sum test,PLIN28B=3.98×10-7,PLINC00577=0.007).These results indicated that LIN28B participated in the regulation of cell cycle and DNA damage repair,and eventually influenced the genome stability.Overexpression of LINC00577 up-regulated the expression of LIN28B,and similar in vitro and in vivo experiments found that LINC00577 could act similarly as LIN28B,suggesting that LINC005 77 might be a key regulator of LIN28B.Differential expression analysis on tumor suppressor genes yielded that the activation of LIN28B was significantly associated with functional loss mutation in SMARCA4.We also observed SMARCA4 binding sites in the upstream region of LIN28B.In addition,we found that the H3K27ac peaks were significantly enriched in the cell lines with SMARCA4 alterations.These results suggested that LIN28B might be the important downstream oncogene of chromatin remodeling gene SMARCA4.We identified new isoforms of LIN28B in H1299 cells.In patients from TCGA projects,the expression of different isoforms were negatively correlated with DNA methylation level of their corresponding promoters.All these results proved that LIN28B was an epi-driver gene of lung cancer.In sum,the second part of our research focused on the driving role of LIN28B in lung adenocarcinoma.We proved that LIN28B was not only a driver gene of lung adenocarcinoma,but also an epi-driver gene.We also explored three potential activation mechanism of this driving effect.These results provided new insights for the first parts:we provided a typical example of epi-driver gene and an ideal target of molecular therapy for patients with lung adenocarcinoma,especially for patients with SMARCA4 loss. |