Background&aimGastric cancer (GC) is one of the most common malignancies and the second leading cause of cancer-related death worldwide. However, its mechanism remains unclear. Although some environmental factors, such as diet, cigarette smoking and Helicobacter pylori, may contribute to carcinogenesis of gastric epithelial cells [2-4], only a fraction of the population exposed to such risk factors develop GC during their lifetime. This suggests that genetic factors play a crucial role in determining an individual’s susceptibility to GC.Mucins are a group of diverse, complex, highly glycosylated extracellular proteins important in maintaining epithelial homeostasis. Cancer cells are often observed to express aberrant forms or amounts of mucins, and these aberrations are thought to play a role in carcinogenesis, especially in regulation of tumor cell differentiation, proliferation, and tumor invasion. MUC5AC is a secreted gel-forming mucin and a marker of gastric foveolar epithelial cells. Altered levels of MUC5AC expression may be involved in GC pathogenesis. Functional genetic polymorphisms in the regulation region could affect MUC5AC gene expression and may contribute to an individuals’ susceptibility to gastric cancer.Repetitive regions of DNA are common throughout the human genome and are characterized by their dynamic, unstable features. They are the major generator of genetic variation and are considered to underlie substantial genetic variability, with novel mutations in such regions explaining much of the’missing’ heritability in polygenic diseases, including GC. However, this kind of genetic variation cannot be included in genome-wide association studies (GWAS) panels and is challenging to assess reliably. Intensive review of the MUC5AC upstream regulation region and around identified a complicated repetitive region (termed MUC5AC-u repetitive region). We undertook this case-control study to determine the nature and extent of genetic polymorphisms within this region, and to explore the association of each genetic variant with the occurrence and progression of GC.Methods1. The UCSC genome browser (http://genome.ucsc.edu) and the GRCh37/hg19release of the human genome were used to generate a map showing the location and major genomic features of the MUC5AC gene, including histone H3lysine27acetylation (H3K27AC) status, transcription factor binding sites, common single-nucleotide polymorphisms (SNPs), and repetitive sequence genomic features of the upstream region. The DNA sequence of the upstream region was downloaded from the Ensembl (http://useast.ensembl.org/Homo_sapiens/Info/Index).2. Two hundred and thirty patients with GC were recruited in Shandong Province, northeastern China. Three hundred and twenty-eight cancer-free individuals without any detectable or known cancers were collected as controls. All these subjects were living in the same residential areas as the cases. Their age and sex were matched with those of patients with GC. All subjects were genetically unrelated ethnic Han Chinese. Each subject was evaluated individually with a pretested questionnaire to obtain demographic data and information on related risk factors, including tobacco smoking and alcohol consumption. Clinical data and pathological characteristics of patients were collected and confirmed from their medical history records and questionnaires, and GC tumor, node and metastasis (TNM) stages were classified according to the system of the World Health Organization (WHO).3. One mL peripheral blood sample was collected from each subject. Genomic DNA was isolated from each sample using a modified salt extraction technique. We obtained tissue samples from36GC patients in our cohort, and samples from each patient consisted of cancerous tissue, the respective para-carcinoma (defined as being1.0cm away from the tumor mass) and surrounding noncancerous gastric mucosal tissues. Genomic DNA was extracted from these samples using the Blood and Cell Culture DNA Mini Kit (Tiangen Biotech, Beijing, China).4. MUC5AC-u repetitive region genotyping was performed using the polymerase chain reaction (PCR); the gene-specific primer sequences used were as follows:sense5’-TCCACCCTAACCCTGTCAGCCGC-3’; antisense5’-GTGGCAGGAGTGTGGGGAAAGG G-3’. PCR amplification of DNA was performed in a total reaction volume of50μL, containing100ng genomic DNA,0.2μM of each primer and25μL PrimeSTAR Max DNA Polymerase (Takara, Japan). PCR was conducted in a9700Thermacycler (Perkin-Elmer, CA, USA) as follows:a5minute initial denaturation at94℃, followed by30cycles of10s at98℃and2minutes at68℃. PCR products were analyzed by gel electrophoresis (1volt/cm) in TAE buffer through1.0%agarose gel.5. To confirm the genotyping results, PCR-amplified DNA samples (amplicons) were selected and sent to BGI Tech (Beijing, China) for purification and Sanger sequencing. This assay was conducted blind with respect to the specimens and study design.6. SPSS13.0software (SPSS, Chicago, IL, USA) was employed for statistical analysis.Results1. Intensive review of the MUC5AC upstream region identified a complicated1710bp repetitive region (termed the MUC5AC-u repetitive region) located between nucleotides-3162to-1452upstream from the ATG initiation codon. This position is immediately downstream of a genomic locus with the capacity to bind several transcription factors. The MUC5AC-u repetitive region contains many interrupted irregular repeats of different lengths and is a complicated combination of microsatellite (e.g., CTCA), minisatellite (e.g., CATTCACT or CATTCACTCATT) and megasatellite (e.g., ACCCATTCACTCACTCACTTATTCACTC) repeats. At the5’ region, a300bp sequence was found to be duplicated exactly, head-to-tail.2. All individuals in the study (328cancer-free controls and230GC patients) were from a Han Chinese population and without any known hereditary disease. Both groups had similar distributions of age, sex and alcohol consumption. There was no significant difference in the distribution of cigarette smoking between the patients and controls (p=0.098). According to the TNM system,10.9%,10.0%,21.7%,42.2%and15.2%of patients had stage0, â… , â…¡,â…¢ and IV disease, respectively.3. Genomic DNA samples were isolated from whole blood of all the subjects and used as templates to amplify the MUC5AC-u repetitive region. Eight alleles with discontinuous sizes ranging from1.1to2.8kb were identified in this Han Chinese population. The1.1kb allele was most common, the1.8and2.0kb alleles were less common, and the others were all relatively uncommon.The overall distribution of the MUC5AC-u repetitive region alleles among patients with GC differed significantly from that found in controls (χ2=58.44, p=3.09×10-10). For further analysis, comparisons of allele frequencies between patients and controls were made individually for each allele, using Fisher’s test (Table1). The1.4and1.8kb alleles were significantly more prevalent in patients with cancer than in controls (3.9%vs.0.0%, pc=3.00×10-6;35.4%vs.25.8%, pc=1.56×10-2, respectively). Additionally, the frequencies of the2.3and2.8kb alleles were significantly lower in patients with cancer than in controls (3.3%vs.9.0%, p=1.51×10-4;0.0%vs.1.8%, p=0.002, respectively), and the multiple comparisons corrected p values were4.68×10-3for the2.3kb and0.062(suggestive) for the2.8kb allele. No significant differences were found when frequencies of other alleles between cases and controls were compared. Based on these observations, we classified the eight alleles as susceptible (S), protective (P), or null with respect to risk (N) as follows:S,1.4or1.8kb; P,2.3or2.8kb; and N, all other alleles. Twenty-one MUC5AC-u repetitive region genotypes were totally identified in our case-control population (Table S3in supporting information files), the genotypes were then defined as NN, SN, PN, SP, SS, and there was no PP genotype in our cohort. The most common genotype (NN) was designated as the reference group. Individuals with the homozygous genotype SS had a2.7-fold increased risk of GC occurrence (OR=2.683,95%CI=1.554-4.361, pc=0.012; Table2). The PN genotype was associated with a significantly reduced risk of GC (OR=0.257,95%CI=0.116-0.569, pc=0.031). Neither of the heterozygous genotypes SN and SP was associated with a change in the risk of GC (both p>0.05).4. In our sample, fifteen GC patients (6.5%) carried the1.4kb allele; three of them were homozygous for this allele and the remainder were heterozygous. Significantly higher percentages of GC patients with at least one copy of the1.4kb allele were younger (<50years) individuals or with more advanced T (T4) and M (M1) stages compared with those lacking it (66.7%vs.17.2%,p=4.37×10"6;93.3%vs.58.6%, p=0.006;53.3%vs.12.6%, p=2.13×10-5, respectively; pc values=1.35×10-4,0.186and6.60×10-4, respectively, after correcting for multiple comparisons;).There were128GC patients (55.7%) in our sample who carried the1.8kb version of the MUC5AC-u repetitive region;35patients were homozygous for this allele. Homozygous patients tended to have an older age of onset (650years), and less advanced T (Tis-T3), N (NO), and TNM (stage0-II) stages compared with patients who were not homozygous for the1.8kb allele (5.7%vs.23.1%, p=0.021;60.0%vs.35.4%, p=0.006;51.4%vs.29.2%, p=0.010;68.6%vs.37.9%, p=7.43×10-4, respectively), although most of the nominally significant p values did not survive the Bonferroni correction.5. As repetitive regions of DNA are unstable in various human malignancies, including GC [24], we next determined whether the hypervariable MUC5AC-u repetitive regions differed in length between cancer, para-carcinoma and surrounding normal tissues from36GC patients. The results showed no differences in band pattern between para-carcinoma and normal tissues in all36patients; however, length alterations were observed in DNA samples of cancer tissues in two GC patients (Figure3). In both cases, bands were detected showing a shift from long alleles in cancer tissue to short alleles in para-carcinoma tissue. In one case, one allele shifted from2.0kb to a novel, 0.9kb allele, and, in another case, one allele shifted from2.3kb to1.4kb. Among the36gastric cancer patients tested, the frequency of cancer-related genome rearrangement in the MUC5AC-u repetitive region was5.6%.6. PCR amplicons of the1.1kb and1.4kb alleles from the gastric cancer tissue DNA were successfully sequenced using the Sanger sequencing technique. These sequences are listed in supporting information files. We were unable to sequence the entire fragment of a1.8kb amplicon (PCR amplicon using the gastric cancer tissue DNA), or any other fragments>1.8kb, due to the complicated and repetitive structure of the target region and limitations of the technique. The sequences show the same main genetic structure and repetitive units as the UCSC genome reference sequence but with different overall lengths. The initial300bp at the5’ end of the1.4kb MUC5AC-u repetitive region sequence are exactly duplicated in a head-to-tail pattern.ConclusionIn this study, length polymorphisms in a complicated repetitive region adjacent to MUC5AC promoter were assessed in230patients with GC and328cancer-free controls. Alleles of1.4and1.8kb were significantly more prevalent in GC group than in controls. In contrast,2.3and2.8kb alleles occurred at significantly lower frequencies in patients than in controls. Individuals with genotype SS had a2.7-fold increased risk of GC occurrence, but PN genotype was associated with a significantly reduced risk of this cancer. Moreover, homozygous or heterozygous individuals with one or two copies of1.4kb allele showed an earlier age of onset and more advanced metastasis stage compared with patients without this allele (Bonferroni corrected p=1.35×10-4and6.60×10-4accordingly), whereas homozygous patients with two copies of1.8kb allele were linked to less advanced GC TNM stage. Our results suggest that certain genetic variations in MUC5AC upstream repetitive region are associated with the susceptibility and progression of GC. |