| Cancer is a complex disease. In addition to the traditional epidemiologicalpredisposing factors, more and more studies show that genetic factors play animportant role in the development of cancer. In the past, molecular cancerepidemiological studies used to use a strategy based on candidate gene or pathway,evaluating the association between single nucleotide polymorphisms (SNPs) in thecandidate genes or biological pathways and the risk of cancers. More recently, withthe development of high-throughput genotyping technologies and the decreasing costs,the genome-wide association studies (GWASs) have been widely conducted. Manynovel susceptibility loci have been identified for various cancers by a number oflarge-scale, high-density genome-wide association studies. Methodological revolutionin molecular epidemiology has largely improved our understanding of the geneticbasis of many diseases including cancers.GWASs have successfully identified several new loci associated with cancer risk.However, the traditional GWASs have focused only on the most significant SNPs thatfulfilled a stringent “genome-wide†significance criterion. Hence, most geneticvariants with relatively small effects would be missed. Given that a large proportionof disease susceptibility genes tend to be functionally related and/or interact with oneanother in the biological pathways, the pathway-based approaches have beendeveloped to evaluate the joint effects of the functionally related genes, which mayhelp uncover the associations of genes with small effect sizes that cluster withincommon biological pathways. The traditional pathway-based approaches simplyassigned the SNPs into the nearby genes based on their physically locations. Thisstrategy has some limitations. First, many SNPs in a gene region might not representfunctional variants of the gene, and including all SNPs in a gene region may increase multiple testing issue. Second, a SNP located in a structural gene but regulating theexpression of another gene may not be annotated with the functional relevance. Morerecently, with advances in high-throughput gene expression profiling and genotypingtechnologies, the GWASs of gene expression that study the genetic variants regulatinggene expression at a genomic scale have been conducted. Thousands of genetic lociaffecting the expression of specific transcripts have been identified and each of theseloci is called an expression quantitative trait locus (eQTL). Instead of assigning SNPsto their physical location, defining the eQTLs and assigning them into the genes theyregulate can help functionally annotate SNPs. The integration of skin eQTL data andthe melanoma GWAS data may help filter a large number of false positiveassociations and enrich the true disease signals into the biological pathways.Based on the above thinkings on GWAS, we conducted the following studies:(1)We applied the traditional pathway-based approach for the GWAS of basal cellcarcinoma (BCC) of the skin, aiming to further explore the genetic informationhidden in the GWAS data and identify novel biological pathways for BCC;(2) Weintegrated the genetics of gene expression into the pathway anslysis for BCC GWAS,aiming to identify more robust pathways associated with BCC by the new approach.In summary, our study gives a try to utilize the GWAS data to a greater extend.The application of the pathway-based approach as well as the new approch thatintegrates the genetics of gene expression and pathway-based approach into BCCGWAS data may help with the discovery of novel pathways involved in the etiologyof BCC. The pathway-based approaches combine the GWAS data with existingbiological knowledge and can make full use of the genetic information in the GWASdata. These approaches can also be applied to other diseases for further data-mainingof the GWAS data. Part I: Traditional pathway Analysis for the Genome-wideAssociation Study on Basal Cell Carcinoma of the SkinGWASs have identified several new loci associated with BCC. However, themost significant SNPs that fulfilled a stringent statistical “genome-wide†significancecriterion often account for only a small proportion of genetic susceptibility. Moreattention should be paid to the rest of the genetic information, which may offer adeeper understanding of the genetics of complex diseases. Combining the modestassociation signals in the GWAS data with information on biological pathways andnetworks, the emerging pathway-based approaches are designed to utilize the GWASdata to a greater extent and are likely to yield new insights into disease etiology.Using an approach based on the gene-set enrichment analysis algorithm, pathwayanalyses have been applied to the GWASs of several complex diseases, and somenovel candidate disease-susceptibility pathways have been revealed.In this study, we applied this approach for the BCC GWAS. We first conductedthe BCC GWAS among1,797BCC cases and5,197controls in Caucasians with740,760genotyped SNPs.115,688SNPs were grouped into gene transcripts within20kb in distance and then into174Kyoto Encyclopedia of Genes and Genomespathways,205BioCarta pathways, as well as two positive control gene sets, whichwere the pigmentation gene set and the BCC risk gene set. The association of eachpathway with BCC risk was evaluated using the weighted Kolmogorov-Smirnov test.One thousand permutations were conducted to assess the significance.Both of the positive control gene sets reached pathway P <0.05. Four otherpathways were also significantly associated with BCC risk: the heparan sulfatebiosynthesis pathway (P=0.007, false discovery rate, FDR=0.35), the mCalpainpathway (P=0.002, FDR=0.12), the Rho cell motility signaling pathway (P=0.01,FDR=0.30), and the nitric oxide pathway (P=0.02, FDR=0.42). We identified four pathways associated with BCC risk, which may offer newinsights into the etiology of BCC upon further validation, and this approach may helpidentify potential biological pathways that might be missed by the standard GWASapproach.Part II: Integrating Genetics of Gene Expression into PathwayAnalysis for the Genome-wide Association Study of Basal CellCarcinomaThe traditional pathway-based approaches simply assigned the SNPs into thenearby genes based on their physically locations. However, SNPs in a gene regionmight not represent functional variants of the gene. Furthermore, a gene may beregulated in trans by genetic variants that are physically distant from the structuralgene. Therefore, a new approach integrating the genetics of gene expression andGWAS pathway analysis is appealing, which may increase the likelihood of enrichingtrue signals in the biological pathways based on the functional relevance. It has beendemonstrated that SNPs associated with gene expression in liver and adipose tissuesare enriched for association with type2diabetes. By integrating the eQTLs of liverand adipose tissues into the pathway analysis for the GWAS of type2diabetes,several well-known disease-related pathways as well as some novel ones have beenidentified, which provide new insights in the etiology of this disease.In this study, we applied the new pathway-based approach combining thegenetics of gene expression and functional classification of genes to the BCC GWASto identify potential biological pathways associated with BCC. We first identified322,324expression-associated single-nucleotide polymorphisms (eSNPs) from twoexisting GWASs of global gene expression in lymphoblastoid cell lines (n=995), andevaluated the association of these functionally annotated SNPs with BCC among2,045BCC cases and6,013controls in Caucasians. We then grouped them into99KEGG pathways for pathway analysis and identified two pathways associated with BCC with P <0.05and FDR <0.5: the autoimmune thyroid disease pathway (mainlyHLA class I and II antigens, P <0.001, FDR=0.24) and JAK-STAT signalingpathway (P=0.02, FDR=0.49). Seventy nine (25.7%) out of307significant eSNPsin the JAK-STAT pathway were associated with BCC risk (P <0.05) in anindependent replication set of278BCC cases and1,262controls. In addition, theassociation of JAK-STAT signaling pathway was marginally validated by using16,691eSNPs identified from110normal skin samples (P=0.08). Based on theevidence of biological functions of the JAK-STAT pathway on oncogenesis, it isplausible that this pathway is involved in BCC pathogenesis.By integrating the functional annotation of SNPs from existing eQTL databasesinto pathway analysis for BCC GWAS, we identified the JAK-STAT signalingpathway associated with BCC risk, which was also marginally replicated using theskin eQTL information based on a small dataset. This approach, which integrates theeQTL information into pathway analysis, helped us utilize the GWAS data to agreater extent. It is noteworthy that this approach may also help identify potentialbiological mechanisms underlying other diseases using GWAS data. |