Font Size: a A A

Integrated Genetic And Epigenetic Data Analysis In Cancer Genome

Posted on:2012-05-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:H DongFull Text:PDF
GTID:1484303356468534Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Part1 Integrated analysis of mutations, microRNAs and gene expression in Glioblastoma MultiformGlioblastoma arises from complex interactions between a variety of genetic alterations and environmental perturbations. Little attention has been paid to understanding how genetic variations, altered gene expression and microRNA (miRNA) expression are integrated into networks which act together to alter regulation and finally lead to the emergence of complex phenotypes and glioblastoma. To perform integrated genetic and epigenetic analysis in glioblastoma studies, a total of 601 genes were sequenced for detection of somatic mutations in 179 tumor tissue samples and 179 matched normal tissues; expressions of 12,042 genes were measured in 243 tumor tissue samples and 10 normal tissue samples and one cell line; expressions of 470 human microRNA (miRNA) were profiled in 240 tumor tissue sample and 10 normal tissue samples in TCGA pilot project. We identified association of somatic mutations in 14 genes with glioblastoma, of which 8 genes are newly identified, and association of loss of heterozygosity (LOH) is identified in 11 genes with glioblastomao?9 of the mutated genes are newly discovered and first reported here. By gene coexpression network analysis, we indentified 15 genes essential to the function of the network, most of which are cancer related genes. We also constructed miRNA coexpression networks and found 19 important miRNAs of which 3 were significantly related to glioblastoma patients' survival. We identified 3,953 predicted miRNA-target-gene pairs, of which 14 were previously verified by experiments in other groups. Using pathway enrichment analysis we also found that the genes in the target network of the top 19 important miRNAs were mainly involved in cancer related signaling pathways, synaptic transmission and nervous systems processes. Finally, we deciphered the pathway connecting mutations, gene and miRNA expression and glioblastoma. We indentified 4 cis-expression quantitative trait locus(cis-eQTL):TP53, EGFR, NF1 and PIK3C2G; 262 trans-eQTL and 26 trans-miRNA-eQTL for somatic mutation; 2 cis-eQTL:NRAP and EGFR; 409 trans-eQTL and 27 trans-miRNA-eQTL for lost of heterozygosity (LOH) mutation. Our results demonstrate that integrated analysis of multi-dimensional data has the potential to unravel the mechanism of tumor initiation and progression. Part2 Genome-wide association study of Copy Number Variation in GBMCopy number variation (CNV) constitutes a large proportion of total genomic variation and is increasingly recognized to be an extremely important risk factor for cancer. To examine the role of CNVs in glioblastoma, a genome-wide association studie of CNVs in glioblastoma was conducted by assaying 221 tumor tissues and 28 normal tissues samples from primary glioblastoma multiform patients in TCGA project. CNVs were measured by the Affymetrix Genome-Wide Human SNP Array 6.0 with 906,600 SNPs and more than 946,000 probes for the detection of copy number variation. CNVs were called by the modified hidden Markov Models (HMM) and 163024 CNV loci were detected. A total of 104 CNV loci with P-value<3.70E-7 showed significant association with glioblastoma. We also did group association tests for CNV in glioblastoma by gene and pathway. We identified 169 genes with P-value <4.77E-6, including oncogene BCAS1, tumor repress genes CAMTA1, APC and CSMD1, transcription factor ELF2, and transcription activator genes ETV1, CREB5 and ZHX3, which were significantly associated with glioblastoma. We also identified 15 significantly associated pathways with glioblastoma with FDR P-value<0.05. These significant pathways include:Metabolism of xenobiotics by cytochrome P450, Calcium signaling pathway, Axon guidance, Colorectal cancer, Tight junction, Regulation of eIF2 pathway, Double Stranded RNA Induced Gene Expression pathway, Glioma, Glycan structures-biosynthesis 1, Jak-STAT signaling pathway, Drug metabolism-cytochrome P450, Keratinocyte Differentiation pathway, Telomerase RNA component gene (hTerc) Transcriptional Regulation, Skeletal muscle hypertrophy is regulated via AKT/mTOR pathway, BCR Signaling pathway. Furthermore, we indentified which mRNAs and microRNAs were significantly affected by those CNV changes. Copy number changes in the 169 genes significantly affect the expression of 19 microRNAs and 410 genes, among which 3 differentially expressed MicroRNAs and 90 differentially expressed genes were regulated by 18 copy number variable genes. Our results provide important clues for investigation of the mechanisms and drug targets of glioblastoma. Part3 Relative Impact of Genetic and Epigenetic Factors on Gene Expression in Tumor TissuesGene expression is influenced by genetic factors such as mutation, SNPs, CNVs and epigenetic factors such as methylation, histone modifications and miRNA. An essential issue in understanding how genetic and epigenetic variation regulates the gene expression is to estimate the proportion of gene expression variation explained by SNPs, CNVs, methylation and miRNA variation. Traditional single marker analysis will miss many variants which individually have small genetic effects, but collectively make a large contribution to the phenotypic variation and miss potential linear and nonlinear interaction among genomic and epigenomic variants. Our study extend the approaches that use all SNPs to estimate the contribution of SNPs to a quantitative trait to estimating proportion of variance for gene expression explained by genomic and epigenomic variants using all available genomic and epigenomic information. Ultrahigh dimensional genomic and epigenomic data pose great challenges. To meet the challenge, we propose to use sparse locally linear embedding (LLE), a parse manifold learning algorithm as a powerful high dimensional data reduction tool. Then, we use lasso regression on the reduced data in low dimensional space to estimate the impact of genomic and epigenomic variants on gene expression. The sparse LLE and lasso regression were applied to the two cancer tumor tissue datasets:TCGA Glioblastoma Multiform (198 tumor tissue samples) and Ovarian Cancer (512 tumor tissue samples). We have made several remarkable findings. First, we showed that on average, expression variance was explained mainly by miRNA, methylation, CNVs, rather than SNPs. Especially, the contribution of miRNA and methylation on gene expression variation is larger and more direct than CNVs and SNPs. Second, the contribution of SNPs to miRNA and methylation variation is small. The contribution of CNVs to miRNA is small, but their contribution to methylation cannot be ignored. The above observations could be replicated in GBM and ovarian cancer studies. Our study demonstrates the feasibility and power of sparse manifold learning and lasso regression for evaluating the contribution of genetic and epigenetic variation to gene expression valuation.
Keywords/Search Tags:Glioblastoma Multiform, mutation, eQTL, network, integrated analysis, Copy Number Variation, Association Study, Pathway analysis, Manifold Learning, Locally Linear Embedding, Lasso regression, Glioblastoma Multiform, Ovarian Cancer
PDF Full Text Request
Related items