Colorectal cancer (CRC) is the third most common diagnosed cancer in the world. Almost 700,000 people are killed by CRC every year, which is the world’s fourth most deadly cancer with mortality after lung, liver and stomach cancer. As the rapid economic growth of China in recent years, and people’s living improves, the incidence of CRC is increased. The heritability of CRC is estimated to be approximately 35%by twin studies. Genome-wide association studies (GWAS) have identified multiple single nucleotide polymorphisms (SNPs) associated with CRC. Surprisingly, known CRC-associated SNPs can explain only 1%-4% of the heritability. Copy number variations (CNVs), a new form of genome diversity, are classically defined as large deletions, duplications or inversions of DNA segments (>1kb). Recently, researchers have found CNVs can explain a part of the "missing heritability". Meanwhile, both the mutation rate and genome coverage of CNVs are far away higher than SNPs. Thus the impact of CNVs on human diseases has aroused great interest of many researchers in recent years. However, plenty of studies have suggested that the majority of common CNVs are in linkage disequilibrium with SNPs in the human genome, directly investigating the common CNVs is unlikely to identify many new variants associated with diseases. Therefore, the interest remains in the rare CNVs. Increasingly, recent studies suggest that rare CNVs have substantial effects on the development of complex diseases though with relatively low frequencies. So far, the rare copy number studies on colorectal cancer are still limited.Purpose:To explore the effects of the rare CNVs in the development of CRC and screen some CRC-associated rare CNVs as well as assess the clinical significance of them. Our study may provide some new evidences for further studies on CRC mechanisms.Materials and Methods:First, we conducted a case-control study and genotyped the peripheral blood DNA from 1004 sporadic CRC cases and 1994 cancer-free controls using Illumina Human-OmniExpress-12vl.O BeadChips. Then both PennCNV and QuantiSNP were used to identify CNVs from Chip data and to exclude CNVs or samples with low confidence. We use PLINK to screen rare CNVs which were defined as those with a frequency of< 0.5% in our dataset for the following analyses. Ten rare CNVs were randomly selected, and two pairs of primers were designed for each CNV segment. Then quantitative real-time PCR (qPCR) was performed to validate the CNVs identified by both programs. Burden analyses were conducted to evaluate the distribution differences of global rare CNVs, rare genie CNVs and rare CNVs overlapped with protein coding sequences (CDSs) between cases and controls. In addition, stratification analyses were conduected considering tumor location, the age of onset and gender. The genes disrupted by rare CNVs were identified using gene-based analysis. Furthermore, GO (Gene ontology) enrichment analysis was performed for the CNV-disrupted genes exclusive to CRC cases. We compared expression differences of genes in significantly enriched GO terms between CRC tissues and paired adjacent normal tissues from GEO (Gene Expression Omnibus) datasets.We further screened candidate genes according to the results from gene-based analysis and expression microarray data from three datasets including TCGA (The Cancer Genome Atlas), GEO and our in-house data (expression chip data of tumor buddings, cancer cells in the center of primary tumor and intestinal epithelial cells from CRC patients as well as all the stromal components of them). SLC18A1 (Solute carrier family 18 member A1) gene, whose copy number states were significantly associated with the expression levels and whose expression levels were also significantly different between CRC tissues and adjacent normal tissues, was selected as a candidate gene. Wilcoxon paired test was used to compare the mRNA levels of SLC18A1 between primary colorectal cancer tissues and the paired adjacent normal tissues. Covariance analysis was utilized to evaluate the expression level of SLC18A1 between different copy number states, using gender and age as covariates. The association between the CNVs of SLC18A1 and the CRC risk were further validated using TaqMan copy number assay with peripheral blood DNA from 3641 independent samples including 934 CRC patients and 2680 cancer-free controls. In order to compare the frequency distribution difference of SLC18A1 between CRC tissues and adjacent normal tissues, DNA from CRC tissues and adjacent normal tissues of 96 Han Chinese CRC cases were also used for SLC18A1 copy number detection. In addition, copy number data of 615 CRC tissues and 544 normal tissues from TCGA were analyzed to compare the frequency differences. Chi-square test was used to evaluate the association between SLC18A1 copy number states in CRC tissue DNA and clinicopathological characteristics including depth of infiltration, lymph node metastasis, distant metastasis, TNM stage and overall survival in both 96 CRC cases from our laboratory and 532 CRC cases from TCGA. In order to determine whether the loss of SLC18A1 was associated with expression level and prognosis in other cancer types, we further examined the expression of SLC18A1 in a pan-cancer dataset including 33 cancer types from TCGA. For survival analysis of CRC cases from our laboratory, TCGA CRC cases and TCGA pan-cancer data, we derived Kaplan-Meier (KM) survival curves for patients and used the log-rank test for comparison of the overall survival rate of patients with different SLC18A1 copy number states or different expression levels of SLC18A1. Finally, the deletion segments of SLC18A1 was viewed in UCSC genome browser to predict the potential functions. All statistical analyses were performed using SPSS version 19.0 or PLINK, and a P value less than 0.05 was considered as statistically significant.Results:The global burden analysis revealed a 1.53-fold excess of rare CNVs in CRC cases compared with controls (P<1×10-6), and the difference being more pronounced for genic rare CNVs and CNVs overlapped with coding regions (1.65-fold and 1.84-fold, respectively, both P<1×10-6). When CRC cases were divided into three groups according to age tertile within all CRC cases, both the cases in the lowest and middle tertile of age carried a higher burden of rare CNVs comparing to the highest tertile. Gene-based analysis showed that 639 CNV-disrupted genes exclusive to CRC cases were significantly enriched in GO (Gene ontology) terms concerning nucleosome assembly and olfactory receptor activity. Interestingly, we found that 17 of the 19 genes (number of DAVID gene IDs) associated with "chromatin assembly or disassembly" were disrupted in younger cases. We further stratified CRC cases into colon and rectal cancer, colon cancer patients displayed more significant enrichmentin the burden analysis than rectal cancer patients. Aforementioned GO terms were all observed in colon cancer after GO enrichment analysis whereas no significant term was found in rectal cancer. According to two whole genome-wide expression profile datasets from GEO database (GDS2947 and GDS4382), we found that approximately more than 40% of the probes corresponding to "chromatin assembly and disassembly" item were differentially expressed between colorectal adenoma/CRC and adjacent normal tissue.In the GWAS stage, we found deletions at SLC18A1 gene in four of the 694 CRC cases but none in 1641 cancer-free controls (P= 0.008). In the replication stage, we found 5 additional SLC18A1 deletions in 934 CRC cases, whereas one SLC18A1 deletion was also detected in one of the 2680 controls. Combined analysis of samples from both stages showed that the odds ratio (OR) of CRC increased to 16.7(P= 6.4×10-5) among the individuals with SLC18A1 germline deletions. Combined analysis using samples from both stages showed that the odds ratio (OR) of the germline deletion was 16.7 (P= 6.4×10-5) for developing CRC.CNV genotyping results for SLC18A1 in 96 CRC tissues and paired normal tissues showed that SLC18A1 was frequently deleted in CRC tumour tissues with a rate of about 33.3%, but no deletion was found in all the paired normal tissues. The deletions of SLC18A1 was significantly associated with distant metastasis (P= 0.036). Consistently, 615 CRC cases with copy number data from TCGA also showed that the frequency of deletions of SLC18A1 was relative high in CRC tissues (~49.3%), and none deletions were found in the normal tissues. In TCGA data, the deletions of SLC18A1 were associated with unfavorable clinicopathological characteristics, including depth of infiltration, lymph node metastasis, distant metastasis, TNM stage and overall survival (P= 0.029, P= 5.4×10-6, P= 2.1×10-4, P= 2.3×10-6 and P= 0.052, respectively). Expression data of 17 paired CRC tissues and adjacent normal tissues from GEO and 32 paired tissues from TCGA showed that SLC18A1 was significantly down regulated in CRC tissues than in paired normal tissues (P= 0.009 and P= 0.004, respectively). Meanwhile, the expression levels of SLC18A1 in both the tumor buddings and cancer cells in the center of primary tumor were lower than that of normal intestinal epithelial cells among all the three CRC cases (fold change= 0.17-0.62 and 0.12-0.57, respectively). Compared with the expression levels of SLC18A1 in normal intestinal epithelial cells, the expression levels of SLC18A1 in the tumor buddings were lower among all the three CRC cases (fold change= 0.17-0.62) and so do the expression levels of SLC18A1 in cancer cells in the center of primary tumor (fold change 0.12-0.57). Furthermore, the expression levels of SLC18A1 in the surrounding stroma of both tumor budding and cancer cells in cancer center area were also lower than that of intestinal epithelial cell stroma.Analysis of TCGA CRC dataset revealed that the deletions of SLC18A1 were associated with decreased mRNA expression levels (P=2.8 x 10’4, N= 369), as well as a survival disadvantage (P= 0.052, N= 489). In additon, CRC cases with lower expression levels of SLC18A1 had shorter overall survival compared with those with higher expression levels of SLC18A1 (P= 0.037, N= 312). In pan-cancer dataset which comprised of 7991 cancer patients from TCGA, we observed that deletion of SLC18A1 was significantly associated with decreased expression compared with SCL18A1 neutral tumours (P= 2.15×10-35), and patients with deletion has significantly disadvantage overall survival (P= 4.72 × 10-8).Through UCSC genome browser, we observed peaks of histone modification mark H3K4Mel, DNaseI hypersensitive clusters and transcription factor binding site within the overlapped region of the four deletions at SLC18A1 (identified at GWAS stage). In addition, dense potential regulatory elements were also predicted by ESPERR in this region. All the above in silico results indicated that some core potential regulator elements may exist in the SLC18A1 CNV segments.Conclusions:1. Global rare CNVs, genie rare CNVs and rare CNVs overlapped with coding regions were all significantly enriched in the germline DNA of CRC cases compared with cancer-free controls, indicating that rare CNVs increased the risk of CRC.2. GO enrichment analysis showed that CNV-disrupted genes exclusive to CRC cases were associated with nucleosome assembly and olfactory receptor activity.3. Younger cases were inclined to carry a significantly higher burden of rare CNVs than older ones, which suggests that they may have a more pronounced genetic predisposition and thus have earlier onsets of CRC.4. Both of the burden analysis and GO enrichment analysis showed that the mechanisms of colon and rectal cancer development may not be identical and colon cancer possessed a stronger genetic component than rectal cancer.5. Germline deletions of SLC18A1 significantly increased the risk of CRC.6. The germline deletions of SLC18A1 gene was rare, while the frequency of somatic deletions of SLC18A1 gene was common.7. Somatic deletions of SLC18A1 in CRC tissues were associated with unfavorable clinicopathological characteristics, including depth of infiltration, lymph node metastasis, distant metastasis, TNM stage and overall survival.8. The deletions of SLC18A1 was significantly associated with decreased expression levels, which was also unfavorable to the prognosis of CRC, indicating that SLC18A1 can act as a potential prognostic factor.9. The expression data of SLC18A1 from TCGA, GEO and cancer/normal cells captured by laser microdissection consistently showed that the expression levels of SLC18A1 were lower in CRC cancer tissues compared with paired normal tissues.10. We found the deletions of SLC18A1 were also significantly associated with decreased expression of SLC18A1 and disadvantage prognosis in pan-cancer datasets from TCGA, indicating the effect of SLC18A1 was not limited to CRC, they may also play roles in other cancers.11. In silico analysis by using tracks of the UCSC Genome Browser implied that the deletion segments of SLC18A1 may harbor some core potential regulator elements. |