Font Size: a A A

Identification And Functional Analysis Of Mouse LincRNA Using Transcriptome Data From Various Tissues

Posted on:2016-09-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H ZhaoFull Text:PDF
GTID:1220330488475750Subject:Genomics
Abstract/Summary:PDF Full Text Request
The previous studies demonstrated that mammalian genomes are pervasively transcribed. Protein-coding genes only account for a small proportion while the majority of these transcripts are noncoding RNAs. Noncoding RNAs play a critical role in the regulation of gene expression. As an important part of noncoding RNAs, lincRNAs serve as decoy RNA, enhancer RNA, nuclear scaffold, host gene of snoRNA, primary microRNA transcripts and ceRNA etc. to regulate gene expression at transcriptional, post-transcriptional and epigenetic level in the process of DNA to RNA. LincRNAs are involved in diverse biological processes such as imprinting, X-inactivation, cell cycle, gametgenesis and development processes especially in regulation of pluripotency. In recent year, lincRNAs are found to be related to cardiopathy, thalassemia, Alzheimer disease especially closely related to the progession of cancer including breast cancer, gastric cancer, lung cancer, hepatocellular carcinoma, prostate cancer, and so on. Identification of lincRNAs to provide a more comprehensive annotation of mouse lincRNAs gives an opportunity for future functional and evolutionary study of mouse lincRNAs and has important implications for cancer treatment and diagnosis. In this study, we develop pipelines to identify lincRNAs for SOLiD and Solexa data respectively. We futher analyze the transcription activity, expression profile and function of these lincRNAs.Mouse cerebrum, testis, and ES cells were sequenced by SOLiD using strand-specific rmRNA-seq method. In cerebrum, testis and ES cells, there were 245,032,381,280,932,595 and 88,306,412 reads mapped to mouse genome separately. According to our pipeline, we identified 395,546,465,149 and 194,996 exons in cerebrum, testis, and ES cells, respectively. To assess the accuracy of exon identification, we compared the defined actively-transcribed regions to RefGene exons. We found most RefGene exons (-94.12%) have been identified, and the aligned length is up to-88.71%. We constructed transcripts for novel exons which were not found in the protein-coding genes and noncoding RNA database according to RNAPII signals, and H3K36me3 signals and then annotated 17,931,18,512, and 6,966 transcripts in cerebrum, testis, and ES cells, respectively. To evaluate the precision of transcripts, we compared our transcripts with the RNAs annotated by the Fantom3 project. As expected, the one-to-one matching rate is about 95.62%, but the aligned length is a little bit lower,-70.99%. As for the transcriptional activity of these transcripts, we found that they show significant associations with signals of transcriptional start and elongation similar to protein-coding genes. At the upstream of these transcripts we observe significant enrichment of H3K4me3, RNAPII binding sites, and CAGE tags that mark transcriptional start sites. Along the length of these transcripts, H3K36me3 is abundant too. These series of evidences have illuminated that these transcripts have their own transcript indicators, can be transcribed independently. Finally,3,329,5,371 and 1,960 lincRNAs were identified in cerebrum, testis and ES cells through coding potential prediction by PhyloCSF and filtering small noncoding RNAs.Totally,11,022 lincRNAs (8,182 lincRNA genes) were found from ultra-deep RNA-seq data of 15 mouse tissues according to our pipeline for lincRNA identification. Then, we characterized the genomic feature of these lincRNAs and found lincRNAs are generally shorter, have fewer exons compared with protein-coding genes. When compared the mean length of genes, exons and introns between lincRNA and protein-coding genes which had the equal number of exons, we amazedly found that lincRNA genes had larger gene and intron length, but smaller exon length than protein-coding genes. Further analysis showed that higher proportion of LTR and LINE in lincRNA genes contributed to the larger intron, and further led to the longer gene length comparing to protein-coding genes with same exon number. The transcriptional activity of lincRNA showed the same result as that of transcripts which were identified above from 3 mouse samples sequenced by SOLiD. As for expression, lincRNAs were expressed in lower level and were more strikingly tissue-specific compared with protein-coding genes. We peformed the gene set enrichment analysis (GSEA) to assign functions to these lincRNAs. As a result, numerous lincRNAs were found to be associated with protein-coding gene sets of distinct functional categories such as signal transduction, immunologic defence, meiosis, energy metabolism and thrombin. LincRNAs in each tissue were closely related to the physiological function of the tissue. For example, lincRNAs in testis mainly played an important role in reproductive development including meiosis, development of primary sexual characteristics, sexual reproduction, and gamete generation, and so on. While in cerebrum, lincRNAs mainly involved in brain development, synaptogenesis, axonogenesis and signal transduction. To validate the existence of lincRNAs identified by our pipeline, we randomly selected 16 house-keeping lincRNAs and 42 tissue-specific lincRNAs to performe RT-PCR. Most lincRNAs were detected by RT-PCR.In summary, our work provided methods to identify and analyze lincRNA. More important, our results expanded the collection of lincRNAs in mouse and gave an opportunity for future functional and evolutionary study of mouse lincRNAs. Meanwhile, these lincRNAs provided a rich source for candidates of functional experiment.
Keywords/Search Tags:Next-generation sequencing technology, transcriptome, noncoding RNA, lincRNA, mouse
PDF Full Text Request
Related items