| Colorectal cancer is one of the commonest cancers in China. The incidence of colorectal cancer is increasing rapidly in the last two decades. The molecular mechanisms underlying colorectal carcinogenesis have achieved great advance recently, but many novel genes which are associated with cancer initiation and progression remain to be explored. Astronomic data have been accumulating in the public database with the accomplishment of large-scale sequencing projects, such as human genome project, finished human genome project, haplotype project (HapMap) and cancer genome anatomy project (CGAP), etc. Expressed sequence tag (EST), a single-passed sequence of compliment DNA (cDNA), is one of the most important ways to find novel biomarkers for cancer. Mining database of EST (dbEST) from the Genobank will shed new light on cancer study.In this manuscript, we downloaded ESTs relevant to colorectal normal mucosa, inflammatory bowel disease, adenoma and cancer from dbEST. We developed a bioinformatics software package-GetUni by utilizing the sequence annotation features in the dbEST and UniGene. We then successfully clustered all these ESTs into UniGene, thus constructed 4 electronic gene expression libraries, normal mucosa (N), inflammatory bowel disease (IBD), adenoma (A) and cancer (T). There are 4,375, 3,451, 875 and 200,608 UniGenes, 4,108,2,230, 606 and 18,891 non-redundant UniGenes, or 4,108,2,230, 592 and 14,879 genes in library N, IBD, A and T, respectively. The cDNA Xprofiler analysis of colorectal normal mucosa and cancer in the National Cancer Institute (NCI) has cross-validated the efficiency of our GetUni software package. Subsequently, overall features for these libraries were analyzed by GOTM (GOTree Machine). Genes in library N, IBD and A were all found in library T only except bA9F11.1 (Hs.329040), which was only present in library A. Each gene in library T had an average of 1.27 transcripts, significantly higher than that in other libraries (p<0.01, x2 test). Differences among these libraries in gene enrichment were statistically significant in 50 signaling pathways as revealed by WebGestalt (p<0.01), such as enrichment of ribosome protein genes or genes of KEGG Glycolysis/Gluconeogenesis pathway in library A and IBD, Integrin pathway genes in IBD and N and 7 transmembrane receptor (rhodopsin family) genes in cancers. Quantitative PCR found elevated expression of RPS2, RPS12, RPS27a, RPL7a, RPL5 and RPL10 in 5,6, 3, 5, 2 and 3 of 8 adenomas, or 21, 18, 17, 14, 21 and 13 of 40 colorectal cancers, respectively. However, the expression of these ribosomal protein genes among normal mucosa, adenoma and cancer was not statistically significant (p>0.05). Hierarchial analysis showed 2 distinct groups of colorectal adenomas and cancers. There are 7 adenomas and 35 cancers in the group with high ribosomal protein gene enrichment and 1 adenoma and 23 cancers in the group with low ribosomal protein gene enrichment. The enrichment of ribosomal protein genes was significantly more common in colorectal adenomas (7/8, 87.5%) than that in cancers (18/40,45%) (p=0.020).Next, we selected 95 genes from electronic gene expression profile in colorectal cancer to construct a low-density array (LDA), a high throughput PCR approach based on the combination of TaqMan quantitative PCR and microfluidic principle. All these candidate genes were focused on cell differentiation and development including 4 signaling pathway (Wnt, TGFβ/BMP4, Hedgehog and Notch), Polycomb Group (PcG) family and others. GAPDH was used as the internal control for normalization. We only failed to detect DAAM1 among all 96 genes by using LDA approach. Totally, 97.2% sample gene tests [11,520/11,844 (384×96)] were successfully amplified. To evaluate the reproductivity of LDA, we compared the threshold cycles (Ct) of GAPDH between different experiments. We found that the range of Ct variation between different times for a specific sample is 0.001~0.952, with a mean±SD of 0.279±0.254. Spearman correlation and Logistic regression analysis showed a significant correlation between different tests in all 60 samples (p<0.0001, r=0.911,r2=0.829). We next analyzed the ΔCt variations between tests in the 94 target genes. The average ΔCt variation was 0.36±0.33. The ΔCt variation was significantly correlated with the mean of ΔCt (p<0.0002, r=0.743,r2=0.551).By using LDA, we identified 79 differentially expressed genes between colorectal cancers and normal mucosa (p<0.05), of which, 4 were upregulated and 75 downregulated as compared with those in the normal mucosa. Thirty-four genes had a change-fold more than 2. In Wnt pathway, NKD1 and SOX9 were significantly upregulated with medium change-folds of 15.15 and 1.62, respectively, and APC, DAAM2 and Tcf4 downregulated 2.71, 2.82 and 2.18 folds in colorectal cancers as compared with those in normal mucosa, respectively (p<0.05). In comparison with those in normal tissue, genes in TGFβ7BMP4 pathway, Hedgehog pathway and Notch pathway, were significantly downregulated in colorectal cancers with medium change-folds of TGFβ1 2.41, SMAD4 2.51, L3MBTL 1.39, IHH 2.03, DISP1 1.79, DISP2 13.96, NOTCH2 2.22, MAML1 1.91, MAML2 2.39, MAML3 1.96, HES1 1.76 and HES2 1.78, respectively (p<0.05). In PcG gene family, only EZH2 showed higher expression level in colorectal cancers with a medium fold of 1.64 and others had lower expression with medium change-folds of EPC1 2.28, EPC2 1.96, PCGF1 1.93, PCGF2 2.02, PCGF3 1.63, PCGF4 1.49, PCGF5 2.28 and PCGF6 1.58. In 7 normal, adenoma and cancer individual-matched cases, 59 differentially expressed genes were found (p<0.05). Most of these genes showed consistent expression alterations in adenomas as in cancers. We applied SYBR Green Q-PCR to detect 19 differentially expressed genes, which were identified by LDA, in 22 normal and cancer individual-matched colorectal patients. A total of 10 genes, SOX9, EPC1, EPC2, CECR1, KLF9, METRNL, NKD1, NUMB, SPRED2 and DISP2, were significantly differentially expressed in colorectal cancers (p<0.05). As expected, all of the 19 genes had the same directional changes (2-ΔΔCt) between SYBR Green approach and LDA in colorectal cancers. The medium change-folds of these genes were significantly correlated between these two approaches (p=0.003, r=0.79, r2=0.637).Finally, we carried out a large clinicopathological survey on one of the upregulated genes in colorectal caner-SOX9, a novel downstream transcriptor in the Wnt pathway. There were no significant mutations in the coding region of SOX9 in 20 colorectal cancers and 5 colonic cancer cell lines. Western-blot showed SOX9 and β-Catenin upregulation in 16 or 15 of 21 colorectal cancers and 5 or 2 of 5 adenomas, respectively. SOX9 and β-Catenin were significantly upregulated in cancers as compared with those in normalmucosa (Mean±SD: SOX9 0.3293±0.3863 in normal mucosa, 907±0.6413 in cancer,p<0.01;β-Catenin 0.3397±0.2921 in normal mucosa, 0.6024±0.498 in cancer, p<0.05). SOX9+ cells were invariably located in the lower part of normal colonic mucosa, showing a characteristic pattern of Ki67 staining. There were more SOX9+ cells in the lower zone (9.6±13.1%) of colonic mucosa than those in the upper zone (3.1 ±5.9%) (p<0.001). In colorectal adenomas, SOX9+ cells could be seen along the whole dysplastic crypt. However, the lower zone contained more SOX9+ cells than that in the upper zone (50.7%±25.1% vs 32.9%±30.0%, p<0.001). There were more SOX9+ cells in colorectal adenomas (39.8%±30.9%) than that in the peri-adenomatous normal mucosa (24.9%±18.2%) (p<0.01). The rate of SOX9+ cells in cancer is 36.7%±30.3%, significantly higher than that in normal mucosa and peri-carcinomatous normal residues, but not in adenomas (p>0.05). The SOX9+ incidence of both adenomas (74.5%, 70/94) and cancers (60.1%, 113/188) was higher than that in the normal mucosa (10/110,9.09%) (p<0.001), if a working protocol of asimple additive scoring system for immunostaining assessment was applied. As we defined "-" and "+" as "low-expression" and "++" and "+++" as "overexpression", we found that SOX9 overexpression is significantly more common in colorectal adenomas (53.2%, 50/94) and cancers (34.0%, 64/188) than that in the normal mucosa, where none showed SOX9 overexpression. SOX9 overexpression was less common in mucin-producing cancer (signet-ring cell cancer and mutinous adenocarcinoma) than that in non-mucin-producing cancer (p<0.05). The 5-year overall survival was significantly lower in colorectal cancer patients with SOX9 overexpression (39.5%, 17/43) than those with low-expression (69.5%, 66/95) (p<0.01). Survival analysis and COX proportional haphazard model indicated that SOX9 overexpression is an independent adverse prognosticator in colorectal cancers (p<0.05,RR=1.381, 95% CI:1.051-1.815).In conclusion, we demonstrated that colorectal caner is heterogenous at the molecular level. Alternative splicing is common in cancers implying its potential role in carcinogenesis. Enrichment of ribosomal protein genes is an important molecular feature for two most important colorectal precursor lesions, adenoma and inflammatory bowel disease. Thus, the electronic gene expression library will help us to explore the underlying general molecular features in the process of colorectal cancer initiation and progression. We also suggested that dysregulation of colonocyte differentiation associated pathways and gene families, such as Wnt activation, inhibition of TGFβ/BMP4, Hedgehog and Notch, and aberrant expression of PcG family genes, had potential roles in the colorectal carcinogenesis. Wnt pathway might play a central role in the intricate network of these pathways. Particularly, we found that SOX9, a novel transcriptor in Wnt pathway, was associated with intestinal epithelial differentiation. SOX9 was overexpressed in colorectal adenomas and cancers and its overexpression was an adverse prognosticator for colorectal cancer, thus it might serve as a potential gene therapeutic target in the future. Finally, we demonstrated that LDA was a sensitive and reliable high throughput quantitative PCR method. A combined application of dbEST data mining and LDA was a novel methodology for cancer biomarkers detection. |