Font Size: a A A

Study On Genetic Susceptibility To Colorectal Cancer Based On Expression Quantitative Trait Loci (eQTLs) Analyses

Posted on:2017-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LouFull Text:PDF
GTID:1314330482994397Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:Colorectal cancer is the third most commonly diagnosed cancer worldwide, which is a serious threat to human health. In China, the estimate of new CRC cases and deaths in 2015 was 376,000 and 191,000, and the five-year survival rate was only about 40%. Therefore, CRC has become a major public health problem in our country. The primary causes of the high morbidity and mortality rates is the obsecurity of the etiological factors and pathogenetic mechanisms underlying CRC development. It is well established that CRC is a complex trait influenced by genetic and environmental factors and their long-term interactions, among which, the genetic factors determine the cancer susceptibility. So far, genome-wide association studies (GWASs) have identified a number of single nucleotide polymorphisms (SNPs) that associated with the CRC susceptibility, but they can not fully reveal the biological mechanisms underlying the development of CRC, and the SNPs identified might not be causal variants. Therefore, it is critical to explore the susceptibility loci to seek for the real causal variants and link them with the disease etiology.It is estimated that about 80% of the SNPs identified by GWASs are located in non-coding regions of the genome, which implies the relevance to gene regulation. Expression quantitative trait loci (eQTLs) analyses can localize the regions of the genome containing DNA sequence variants that influence the expression level of one or more genes, thus it provides an effective and feasible strategy to interpret the biological relevance of the SNPs in non-coding regions. Moreover, it is noted that at least some eQTLs are tissue specific, and when eQTLs analyses are performed in tumor tissues, the effect of somatic and epigenetic alterations that can affect the gene expressions should be taken into considerations.The public databases established by far, such as GWAS catalog, Encyclopedia of DNA elements (ENCODE), and The Cancer Genome Atlas (TCGA), help the scientists read deeply into the nature of the data and find the real causal variants, which is of great importance to further clarify the mechanism of the development of CRC.Objectives:1. To explore the eQTLs located in the susceptibility loci identified by CRC GWAS, using the CRC datasets in TCGA.2. To evaluate the associations between these eQTLs and CRC risk in Han Chinese.3. To seek for the real causal variants underlying the eQTLs and to examine the biological functions of them.Methods:1. The CRC datasets were downloaded from TCGA data portal, and after population stratification, two multivariate linear regressions were performed to adjust for the effects of somatic copy-number and CpG methylation alterations, and to evaluate the associations between tagSNPs and SNPs in linkage disequilibrium (LD) with them located in susceptibility loci identified by CRC GWASs and the expression level of genes, which are called integrative cis-and trans-eQTL-based analyses. Next, motif analyses using bioinformatics tools such as ENCODE, Cistrome were conducted to find the trans-eQTLs that have intermediations (transcription factors, TFs) between the variants and their target genes.2. The SNPs retrieved above were genotyped using Sequenom MassARRAY genotyping platform in 768 CRC cases and 768 healthy controls in Beijing, after removing the redundant ones due to LD by HaploView. Unconditional logistic regression analyses were applied to examine the associations between these SNPs and CRC risk.3. For each significant SNP, several bioinformatics tools, such as RegulomeDB, rSNPBase, ANNOVAR, and GWAVA, were used to annotate and screen out the potentially functional variants in the tight LD region.4. A two-stage case-control study comprised of 1833 CRC cases and 2758 healthy controls was conducted to explore the associations between the potentially functional variants and CRC risk, using TaqMan genotyping platform.5. Dual luciferase reporter assay and the electrophoretic mobility shift assay (EMSA) were performed to further explore the biological function of the candidate SNP, to validate its role in genetic etiology of CRC.Results:1. A total of 254 CRC cases of European ancestry were obtained from TCGA database, and the integrative cis-eQTL-based analyses resulted in 58 pairs of eQTL associations, including 54 SNPs and 8 targeted genes (false discovery rate-.P<0.1), among which 5 genes showed differential expressions between CRC tumor tissues and paired normal tissues. Trans-eQTL-based analyses and motif analyses identified 15 SNPs and the corresponding two TFs (MYC, ATF1).2. After deleting the redundant SNPs by HaploView, a total of 16 candidate SNPs were included in the first stage of case-control study in Han Chinese. Five SNPs (rs6983267, rs174449, rsl 1169524, rs4768924 and rs4500718) showed significant associations with CRC risk under multiple models after Bonferroni correction. Compared with the respective major allele of each SNP, the results of additive models were shown as follows:rs6983267, odds ratio (OR)=1.32,95%confidence interval (CI)=1.14-1.53,P=0.0002; rs174449, OR=0.70,95% CI=0.59-0.83, P=4.25×105; rs11169524, OR=1.33,95% CI=1.13-1.55, P=0.0004; rs4768924, OR=0.75,95% CI=0.64-0.88, P=0.0003; rs4500718, OR=0.68,95% CI=0.57-0.81, P=2.28×10-5.3. Apart from rs6983267 and rs174449, we selected three most potentially functional SNPs (rs61926301, rs12424860 and rs16260) from the tight LD regions of the three significant SNPs (rs11169524, rs4768924 and rs4500718) respectively, through multiple bioinformatics analyses.4. For the first stage in Beijing region, rs61926301 and rs12424860 were found significantly associated with CRC risk (OR=1.34,95% CI=1.14-1.57, P=0.00035; OR=0.74,95% CI-0.63-0.86, P=9.82×10-5, under additive model, respectively). For the second stage in Wuhan region, only rs61926301 showed a significant association with CRC risk (OR=1.22,95% CI=1.09-1.37, P=0.0006, under additive model). When two stages combined together, rs61926301 exhibited significantly increased risk of CRC under all the genetic models (codominant model:GT vs GG:OR=1.27, 95% CI=1.12-1.44, P=0.0003; TT vs GG:OR=1.59,95% CI=1.29-1.96,P=1.1×10-5; dominant model:OR=1.32,95% CI=1.17-1.49, P=6.0×10-6; recessive model: OR=1.41,95% CI=1.16-1.72, P=0.0006; additive model:OR=1.26,95% CI=1.15-1.38,P=1.0×10-6).5. Dual luciferase reporter assay indicated that the fragment with rs61926301 T allele had stronger activity of promoter than that with rs61926301 G allele in two CRC cell lines (PSW480<0.0001;PHCT116=0.0224). Moreover, EMS A indicated that the fragment with rs61926301 T allele had stronger ability to bind with certain TFs than that with rs61926301 G allele.Conclusions:1. rs61926301 G>T polymorphism located in 5'untranslated region of ATF1 gene was significantly associated with the increased risk of CRC in Han Chinese.2. rs61926301 G>T polymorphism might contribute to CRC risk by improving the activity of the promoter and the ability to bind with the relevant TFs, resulting in the increased expression levels of target genes such as ATF1, DIP2B. The biological function of rs61926301 is identified by our study for the first time, but its more explicit biological mechanism awaits further exploration yet.Innovations:1. In this study, a total of 254 CRC cases with multiple types of datasets, such as genotyping and RNA sequencing data, were used from TCGA database to explore the susceptibility loci identified by CRC GWASs, which had less burden of experimental cost, larger sample size, and higher power for eQTL analyses than those before.2. In this study, the integrative eQTL-based analyses were performed, where the effect of somatic and epigenetic alterations that could affect the gene expressions in tumor tissues were taken into considerations, thus leading to better exploration and more reliable results on the regulation effects of the germline genetic polymorphisms on the target genes.3. In this study, multiple methods, such as case-control association study, bioinformatics analyses, and function assay were applied comprehensively to locate and validate the causal variants underlying the susceptibility loci identified by CRC GWASs.
Keywords/Search Tags:colorectal cancer, genetic susceptibility, single nucleotide polymorphism, expression quantitative trait loci, association study, biological function
PDF Full Text Request
Related items