Font Size: a A A

Integrative Analyses For Identifying Novel HCC-related LncRNAs

Posted on:2015-11-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:1224330431973910Subject:Genetics
Abstract/Summary:PDF Full Text Request
Hepatocellular carcinoma (HCC) is one of the most pervasive human cancersworldwide. According to the record in2008, about50%of the new liver cancer casesworldwide occurred in China. Approximately80%of the Chinese HCC patients arehepatitis B virus (HBV) carriers. However, despite the advances in medical treatment,the total5-year survival rate of HBV-related HCC patients still remains poor, andabout600,000HCC patients died every year. It’s very important to discover novelpathogenic genes of HCC and research into their roles in the occurrence andprogression of HCC.Long non-coding RNAs (lncRNA) are non-coding transcripts longer than200nucleotides. Comparing with other kinds of non-coding RNAs, such as microRNAs,lncRNAs possess longer sequences and therfore more complex secondary and tertiarystructures. Previous studies have demonstrated their regulatory functions in theprocesses of cell cycle, apoptosis and differentiation.etc. Furthermore, increasingevidence suggests that lncRNAs also play important roles in tumor biology.Dysregulated expression of lncRNAs in cancer marks the progression of disease andmay serve as a predictor for patient outcomes.To discover and identify lncRNAs related with occurrence and progression ofHCC, we took advantage of the “omics” technology and applied bioinformaticsanalyses. Our research may facilitate other researches on roles lncRNAs play in HCC,mechanisms through which lncRNAs work and even the application of lncRNAs. Forsuch purposes, our team has recruited tumor and tumor-adjecant samples from40HCC patients. Using Affymetrix Human Exon1.0ST Array and Array Star Human12x135k Long Non-coding RNA Array, we investigated the expression ofprotein-coding genes and lncRNAs. We applied integrative analyses and saught toidentify lncRNAs related with HCC.Firstly, the raw data from Affymetrix and Array Star microarrays wererespectively preprocessed using RMA (Robust multi-array average). After a series ofquality control, we obtained the protein-coding gene and lncRNA profiles. We appliedSVA (Surrogate variable analysis) to adjust the covariables in the model we built anddetected1,212/362differentially expressed genes/lncRNAs. Furthermore, from GOanalysis, pathway analysis and GSEA (Gene set enrichment analysis), we obtained42GO terms,5KEGG pathways and231MsigDB gene sets enriched in our data,respectively, and found that terms related with immune system response, cell cycleand metabolism were statistically significant in the results of three methods. Besides,20statistically significant terms from our result of GSEA was consistent with7otherHCC studies, which suggested that our data was reliabe..We used a guilt-by-association method to integrate two profiles and built an association matrix consisting of141differentially expressed lncRNA candidates and364functional terms. Then bi-clustering revealed4modules containing13lncRNAcandidates from the matrix. Three modules were related with cell cycle and one wasrelated with metabolism. Then the result of pathway analysis suggested that P53pathway was also significantly enriched in the genes of cell cycle-related modules. Toprove genes in each module share some expression signature, we performed GSCA,the result of which suggusted that these genes were differentially co-expressed.Moreover, the datasets (GSE22058and GSE3500) from two independent studiescould confirm this result.To investigate the chromatin signatures of these lncRNA candidates, we collected5chromatin state maps of HepG2cell line from the ENCODE project. We nextdeveloped a software named CSF++according to the codon substitution frequency(CSF) algorithm to estimate their protein-coding potential. By considering theseinformation, ASLNC18598was subjected to further experiment. According tochromatin maps, there were H3K4me1and H3K4me2peaks at its promoter locusfollowed by H3K36me3peak along the transcribed region. Due to the chromatinsignatures and its poor coding potential (CSF Score=2.30), we consideredASLNC18598as a bona fide lncRNA. The result of GSEA suggested thatASLNC18598was associated with cell cycle process, DNA damage and P53-regulated pathways.Emerging evidence has demonstrated that lncRNAs play important roles inepigenetics. Therefore, we used a supervised module map strategy and identified41lncRNA candidates involved in epigenetics. We next investigated their chromatinsignatures and measured their protein-coding potential. The ENCODE data suggestedthat along the transcribed region of ASLNC18342existed a H3K36me3peak.Considering this and its poor coding potential (CSF Score=2.43), we selectedASLNC18342to be the best candidate. The results of module map method and GSEAboth suggested that ASLNC18342was related with epigenetics and cell cycle. Thenthe result of RNA-protein interaction prediction indicated that ASLNC18342mightinteract with proteins of MLL family and mediate the H3K4me3signature of its targetgenes.The results of RACE (Rapid amplification of cDNA ends) experiments suggestedthat ASLNC18598and ASLNC18342were bona fide lncRNAs. Phenotypeexperiments on cell lines and nude mouse also confirmed their abilities to affect theprocesses of cell cycle and tumorigenesis. Further researches on molecularmechanisms suggested that ASLNC18598could interact with hnRNP-K, which wasconsitent with the result of our pathway analysis; ASLNC18342coud interact withMLL1, which also confirmed the result of our previous analysis. In summary, we haveidentified ASLNC18598and ASLNC18342, two lncRNAs involved in cell cycle, which demonstrated that integrative analysis of “omic” profiles was a powerfulstrategy to discover new HCC-related genes.
Keywords/Search Tags:HCC, lncRNA, cell cycle, epigenetics
PDF Full Text Request
Related items