Font Size: a A A

Bioinformatics Analysis Of Predisposion Associated Gene And Gene Expression Profiles About Hepatocellular Carcinoma

Posted on:2016-04-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:G H ShangFull Text:PDF
GTID:1224330482456580Subject:Oncology
Abstract/Summary:PDF Full Text Request
part oneBibliometrics and bioinformatics analysis of HCC predisposion associated geneBackground and Purpose:The incidence of hepatocellular carcinoma (HCC) are ranked in the top three of the cancer, the majority are hepatitis B virus (HBV)-related HCC. the number of the patient in china is the most in the world. Since the majority of patients with HCC have been diagnosed with advanced stage, there are few effective treatment for these patients. If we can find lesions in the early stage, the majority of small HCC patients can be cured.Hepatocellular carcinoma (HCC) most often develops in patients infected with hepatitis B (HBV) or hepatitis C virus (HCV) infection; or with other etiologies such as long-term alcohol abuse, aflatoxin exposure, autoimmunity, hemochromatosis, smoking, gender, fatty liver, diabetes, metabolic syndrome, and other risk factors. All along, the patients with risk factors are advised to receive regular screening for finding early cancer, it is an effective way to improve the survival rate. Only a small number of people eventually develop HCC, about 85-90% of infected individuals become inactive carriers with sustained biochemical remission and very low risk of HCC. While individual differences in genetic background has a certain influence. If we can simultaneously assess the clinical features and genetic risk on patient, we can develop more valuable screening and treatment programs about precancerous lesions.In recent years, there are many studies about HCC susceptibility genes, a lot of susceptibility genes have being found. This study aimed to evaluate the large number of studies and to find the key genes using bibliometrics and bioinformatics analysis method.A large part of research about HCC susceptibility genes is case-control study using a pre-selected gene, the possible target genes is selected according to preset functions of genes. PCR-RFLP, PCR-SSCP, TaqMan probe method and so on were used to find the differences about gene distribution between cases and controls, these methods are relatively low cost, but there is a small number of gene being tested. In recent years, there is a growing number of genome-wide association study (GWAS) on finding HCC susceptibility genes. Gene chip can detect millions of single nucleotide polymorphisms simultaneously. Hundreds of thousands of genes can be tested, it can find more valuable genes than people expected.A systematized review of literature was performed using Embase, Pubmed and BIOSIS Preview databases. We gathered all HCC predisposing associated genes. Then techniques in bioinformatics were used to determine the key genes and provide reference information for further studyMethods:1. The collection and analysis of literature:We searched relative articles published from January 2001 to January 2014 from Embase, Pubmed and BIOSIS Preview databases. In addition, as the original studies about HCC susceptibility genes gradually increased, META research is also increasing rapidly. IN this study, we further analyzed published META literatures in this field, which were published before January 31,2015.2. Gene bioinformatic analysis:HCC susceptibility genes collected from literatures were converted into unified official names using Clone/GeneID Converter and Gene database in NCBI. Gene ontology (GO) classification and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were used on the base of online software GATHER.In addition, the HCC susceptibility gene uploaded to online software STRING 9.1 to build protein interaction network. The network imported into software Cytoscape 3.0.2 to make visualization, then we took advantage of its plug CentiScape 2.1 to calculate the topological characteristics of the network and each node, and to filter out the key nodes.The results1 Bibliometric study results3362 related literatures were found in Embase database,1297 related literatures were found in Pubmed/Medline database,3902 related literatures were found in BIOSIS Previews database. Merging three databases search results,708 articles related to risk of HCC were included. The result shows the number of publication paper has been increased rapidly in past 3 years. But most genes has 1-3 studies. As of January 31,2015, a total of 61 META analyses were collected in this field, most researchers are Chinese.34 genes were analyzed with META:CYP1A1 CYP2E1 EGF EPHX1 ERCC2 GSTM1 GSTP1 GSTT1 HFEHLA-DQHLA-DQB1 HLA-DRB1 IFNL3 IL10 ILIB IL6 KIFIB MDM2 MIR146A MIR196A2 MIR499A MTHFR NAT2 OGG1 PNPLA3 PTGS2 SOD2 STAT4 TGFB1 TNFTP53 UGT1A7XRCC1 XRCC3In addition, there are seven GWAS in this field, related genes and SNP sites discovered are as follow:2 Gene bioinformatics analysis.We obtained 201 HCC risk genes, which are mainly related to immune response, regulation of cellular physiological process, response to external stimulus, inflammatory response, DNA repair, regulation of cell cycle, cell proliferation, and apoptosis. A total of 741 GO functional classifications and 66 pathways were significantly enriched (p<0.05, FDR corrected) respectively. A complex protein-protein interaction network was constructed by 189 protein-coding genes, which suggested that there was close interaction between these proteins. This network was visualized and analysized by Cytoscape software, and 11 key genes were determined:TP53、IL6、TGFB1、TNF、ESR1、VEGFA、HIF1A、STAT1、IFNG 、SOD2、CTNNB1。In conclusion:1. Studies in this field was growing rapidly, research institutions are mainly in East Asia and the Mediterranean region, where the incidence of HCC is high. META analysis was also increasing in this field, which was conducted mainly by Chinese researchers.2. Most studies lacked sufficient verification. Pre-selected genetic researches and GWAS had its advantages and disadvantages either, they all need large samples of studies with different races and different genetic backgrounds for verification.3. Using GO classification and KEGG pathway analysis, we found that HCC susceptibility gene are focused on the stress response, inflammation and immune response, DNA repair, cell cycle regulation, detoxification and other gene sets.4. We have found 11 key genes, they are worthy of further studies for their important function.Part two Bioinformatics Analysis of gene expression profiles in hepatocellular carcinomaBackground and objective:There are many studies about HCC’s pathophysiological mechanisms and treatment. But compared to many other types of cancer, the treatment research on HCC has made less progression. Most of HCC patients were diagnosed with advanced stage, so that they have fewer opportunities to accept radical operations, while effective rate of chemotherapy is very low. In recent years, on behalf of targeted therapy, sorafenib has opened up a new way for HCC bio-target therapy. But the clinical effective rate of sorafenib is not high also, and patients will have to face the problem of drug resistance after a period of treatment. Although there were many researched on molecular-targeted drugs, the results were not satisfactory. It is urgent to make in-depth research on the molecular mechanism of HCC incidence and progression; it will find suitable molecular targets and improve the therapeutic efficacy.Just like many other cancers, HCC formation undergoes a process from preneoplastic lesions, low-grade to high-grade lesions. Most HCC represent the stepwise carcinogenic process from chronic hepatitis, cirrhosis and dysplastic nodules (DN) to HCC. Due to genetic instability, there are many gene sequence mutations and epigenetic changes in this process. Gene sequence change will lead to the gene expression abnormalities in transcriptome level, thus the function of cell will change, and that will cause the normal cell to malignancy. Many genes dysfunction occurred, including the promotion of cell proliferation, anti-apoptosis, promoting invasion, metastasis, angiogenesis, stem cell characteristics, changing the energy metabolism of tumor cells and other aspects of the genes. It also includes many genetic changes with small influence on the cell function. Genetic changes with different pathogenic causes are not exactly the same, but it will be similar in pathological process, and therefore there may be some common molecular pathological changes, activating pathways and abnormal genes. So we incorporated gene expression profiles of HCC which had different pathogenic causes background, this research would find common molecular changes.Many genes in HCC appear abnormal expression. How to find the key genes? Previous researches mostly focused on single gene, which found the relationship between genes and clinical phenotypes using case-control study. However, the development and progression of cancer simultaneously involved in various genes. These genes are not functioning independent. They tend to be in various regulatory pathways, affecting the functional status of the whole pathway. So it is possible to find biological pathway which enriched abnormal genes through pathway analysis method, and then we can explore the core mechanism for the occurrence of cancer. Instead of a single gene, gene set method can be more helpful in comparison of biological differences between two sets of gene expression dates. If several genes in a pathway show a slight expression change, it will cause more significantly biological change than a single gene fold change. Because many genes simultaneously regulate multiple pathways, there is a complex network. Systems biology approaches are increasingly being applied to genomic data analysis. Inherent in this discipline is the description of network models whose component elements (e.g., genes, proteins, etc.) interact to realize a biologic process. From this perspective, the network interactions affecting the process are more informative than the individual components. Focusing on the larger scale gene signaling pathways and biologic processes defined by interactions among groups of genes allows a more coherent picture to emerge when analyzing complex genomic datasets. It is useful to find the key nodes in the protein-protein interaction network, these key genes is important for maintaining overall function of cell. Some genes have important function in the entire pathways, but their expression is less significant, mild changes in these genes expression may have a significant impact on the function of downstream genes, Pathway analysis and interaction network analysis will be vital to find this genes. In the past, it was difficult to find these genes rely solely on single level of gene expression.High throughput transcriptional profiling techniques have revealed gene expression signatures that accurately characterize many human cancers, gene expression profiles has been used to provide insights into the pathogenesis of HCC. GEO database is supported by US National Institutes of Health Center for Biotechnology Information, it is largest and most comprehensive database of public gene expression profiles in the world. GEO database archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. It has already collected more than 1 million samples of expression profiles on the original datas. Users could query and download experiments and curated gene expression profiles for studies relevant to their interests without duplication of research. This study aimed to characterize the molecular events of the hepatocarcinogenic process, and to identify genes maintaining the function of HCC cell, the gene expression profiles of tissue samples were analyzed representing the stepwise carcinogenic process from normal liver to HCC. By KEGG and Biocarta pathway analysis, we gathered pathways which enrichment genes significantly. Further more. We extracted difference pathway genes to build protein interaction network, then we find the key nodes of genes that are critical nodes to maintain HCC cell function.Materials and Methods:1.1 Microarray Data were downloaded from GEO database. Gene expression profiles data included normal liver, liver cirrhosis, liver dysplastic nodule and hepatocellular carcinoma tissue data, all tissue data were come from human. GPL570 ([HG-U133Plus2] Affymetrix Human Genome U133 Plus 2.0) chip was used.1.2 Analysis of microarray dataData were analyzed using BRB-Array Tools Software 4.4.0-Beta1 version. R version 3.0.2, hgu133plus2.db (Version:2.10.1). The gene expression profiles data of cirrhosis, dysplasia and HCC liver tissues were compared with that of normal liver tissue. Different genes were sorted by GO, KEGG and Biocarta pathways.1.3 Construction of protein interaction networkGenes in significant pathway were imported into STRING 9.1 software to construct protein-protein interaction network.1.4 Network Visualization and screening of key genesThe network was analyzed using Cytoscape and CentiScaPe plugin, and the key genes were selected according to betweenness and degree.Results11 data sets were downloaded from GEO database, including 39 normal,33 cirrhosis,17 dysplasia and 286 HCC.Differentially expressed genes were classified in GO, KEGG and Biocarta pathway analysis. Gene sets of HCC have no overlap with others. GO of HCC focused on:transcriptional regulation of cell cycle, DNA replication, metabolism of amino acids, the regulation of mitosis. These gene sets are related with high growth rate state of tumor cells. KEGG pathway focused on energy metabolism, amino acid metabolism, DNA replication, folate one-carbon units library and metabolism of drug, the following pathway showed higher expression level:hsa00670 (One carbon pool by folate), hsa03030 (DNA replication), hsa04966 (Collecting duct acid secretion), hsa05410 (Hypertrophic cardiomyopathy (HCM)).Biocarta pathways focused on cell cycle regulation, DNA damage and repair pathways, the following pathway showed higher expression level:hvdrPathway ( Control of Gene Expression by Vitamin D Receptor), hg2Pathway (Cell Cycle: G2/M Checkpoint), hptclPathway (Sonic Hedgehog (SHH) Receptor Ptcl Regulates cell cycle), hcdc25Pathway (cdc25 and chkl Regulatory Pathway in response to DNA damage), hantisensePathway (RNA polymerase Ⅲ transcription), hatrbrcaPathway (Role of BRCA1, BRCA2 and ATR in Cancer Susceptibility), hmcmPathway (CDK Regulation of DNA Replication), hstathminPathway (Stathmin and breast cancer resistance to antimicrotubule agents), hrbPathway (RB Tumor Suppressor/Checkpoint Signaling in response to DNA damage), hsmPathway (Spliceosomal Assembly), hck1 Pathway (Regulation of ckl/cdk5 by type 1 glutamate receptors), hck1 Pathway (Rac 1 cell motility signaling pathway), hEfpPathway (Estrogen-responsive protein Efp controls cell cycle and breast tumors growth), hcellcyclePathway (Cyclins and Cell Cycle Regulation). The following pathway showed lower expression level:hnuclearRsPathway (Nuclear Receptors in Lipid Metabolism and Toxicity), hlectinPathway (Lectin Induced Complement Pathway), hcompPathway (Complement Pathway), hghrelinPathway (Ghrelin: Regulation of Food Intake and Energy Homeostasis)Because HCC gene sets were different with others, the genes in Biocarta and KEGG pathway of HCC were imported into STRING and Cytoscape.43 key genes were found:CYP2C19、ESR1、CYP2A6、CYP3A4、IGF1、FTCD、GSTA3、CYP2B6 、PLG、CAT、HADH、KNG1、F2、GOT2、PPARA、PKLR、SERPINC1、MDH2 、PTGS2、POLE、SHMT2、GOT1、CCND1、MDH1、CREBBP、CYP3A5、DHFR 、SERPINE1、PPARG、GSTA4、TP53、CDK2、CAD、GAPDH、ATIC、POLA1 、PCNA、RFC4、POLD1、TYMS、TGFB1、CDK1、PKM2Being compared with NCG 4.0 database, the following genes are driven genes on cancer:ATIC, CCND1, CREBBP, FTCD, MDH2, PPARQ TP53.In conclusion:In this study, the gene expression profiles datasets of cirrhosis, dysplasia and HCC were compared with the datasets of normal liver respectively. It showed that there were almost no overlaps between different group of gene sets; it also showed that the gene expression profiles displayed different change in stepwise carcinogenic process (from preneoplastic lesions to HCC). It does not accumulate gradually. Each stage shows a different molecular functional status. We found several significant gene sets and some key genes; this will provide insight for more precise understanding of disease mechanism and expand the opportunity of biomarker/therapeutic target discovery.Our approach points out that viewing the genetic alterations of cancer from the "bird’s eye" vantage point of the overall pathway rather than the component genes will help us to understand the biological theme behind large gene lists. Our results emphasize the importance of viewing genomic data in terms of over all processes and pathways as opposed to focusing on individual genes. Future studies are on going to expand our datasets as more HCC data become publicly available, to rationally construct and experimentally validate pathways specific to HCC, and to corroborate our approach in other disease models.
Keywords/Search Tags:HCC, Key genes, Predisposing genes, Bioinformatics, Gene expression profiles
PDF Full Text Request
Related items