Font Size: a A A

Construction Of A Plant MiRNA Genomics Database, And Genome-wide Analysis Of Intronic MiRNAs In Arabidopsis And Rice

Posted on:2013-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:G D YangFull Text:PDF
GTID:1110330374493883Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
microRNAs (miRNAs) are small noncoding RNAs with a length of approximately21-24nucleotides which can regulate the expression of certain target genes, either by messengerRNA (mRNA) degradation or by translation repression. The expression of miRNAs in plantsinvolves transcription from MIRNA loci by RNA polymerase (pol II), multi-step processing ofthe primary transcripts by the DCL1complex. However, the knowledge about transcriptionalregulation of plant miRNAs is limited. In this project, we developed a plant miRNA genomicsdatabase (pmiRGD). The pmiRGD is a comprehensive resource which provides informationabout miRNA genomic organization, experimentally verified primary transcripts, putativetranscription factor binding sites (TFBSs), and deep sequencing data for miRNA. Theinterplay of these various information sources concerning genomic features associated withMIRNA genes and their expression profiles could provide extremely important clew for usersto discover the transcriptional regulation and function of miRNAs in planta.The majority of miRNAs are localized within intronic regions of protein-coding genes(host genes) and have diverse functions in regulating important cellular processes in animals.To date, few plant intronic miRNAs have been studied functionally. In the present study, wecarried out a genome-wide analysis with a particular focus on the characterization of intronicmiRNAs in rice and Arabidopsis.The main results were as follows:I Construction of the pmiRGD database(1) miRNA data collection: The pmiRGD collects available plant miRNA data depositedin public database and gleaned from the recent literatures.9,299pre-miRNA sequences wereretrieved from miRBase (release17), PMRD and six literatures published in recent years.Moreover, the up-to-data genome assembly and corresponding annotation files of plant species were carefully chosen, and retrieved from TAIR10, RGAP6.1and Phytozome6.0.(2) miRNA genomic organization: For each miRNA, we identified the genomic locationand putative overlapping gene by querying the genome assembly and correspondingannotated sequences using the appropriate Perl object and running a BLASTN analysis.Overall,7,255miRNAs hairpin sequences match to7,940unique genomic locations, andabout10%miRNAs reside within the introns of other genes.(3) miRNA clusters: We classified the pri-miRNAs into two groups: miRNA clusters fromintergenic regions and intragenic regions.67intragenic miRNAs reside within32clusters,and754intergenic miRNAs reside within318clusters (MID=3kb).(4) miRNA primary transcripts:328experimentally verified primary transcript sequencesfor127MIRNA genes were identified from three published literatures in Arabidopsis thalianaand Zea mays. In addition, the mRNA of941unique host gene were retrieved as pri-miRNAsfor969intragenic miRNAs.(5) Transcription start site (TSS) and promoter sequences: We identified the genomiclocation of the TSS associated with aforementioned miRNA primary transcripts. We choose tofocus on the1kb upstream region for miRNA promoter sequences. Similarly, if no primarytranscripts were identified, sequences are extracted in range (-2000,0) with respect to eachpre-miRNA, but shortened if necessary so as not to overlap with any upstream gene3′-UTR.(6) Identification of Transcription Factor Binding Sites (TFBSs): In order to identifyputative TFBS near the TSS of miRNA primary transcripts, we employed two freely availableprograms, P-Match and TF-scan. All the position weight matrices (PWMs) of plant promoterelements from TRANSFAC6.0and99PWMs constructed by Megraw et al., were matched tomiRNA promoters by P-Match and TF-scan, respectively.(7) miRNA expression profiling: In our pmiRGD database, we have already addeddeep-sequencing data concerning miRNA expression profiling in different developmentalstages in Arabidopsis and rice. We extracted the read sequences and counts from the GEOdatabase and map the reads to the set of miRNA precursor sequences using Bowtie allowingat most two mismatches between the read and the hairpin sequence.The pmiRGD website was constructed using Hypertext Markup Language (HTML) in theMicrosoft Visual Studio2008environment, and graphical user interface (GUI) interact with Access2003database engine. pmiRGD can be freely accessed at http://www.plantmirgo.org.II Genome-wide analysis of intronic microRNAs in Arabidopsis and rice(1) To identify intronic miRNA genome-widely, we collect1495and2760miRNAprecursors from miRBase (release18), PMRD and recent literatures for Arabidopsis and rice,respectively. BLAST result revealed that37and181intronic miRNAs were found within thesense strands of the intronic regions of protein-coding genes in Arabidopsis and rice,respectively. RT-PCR results suggest that14and10intronic miRNAs were reliable inArabidopsis and rice, respectively.(2) Gene structure revealed that one cluster was found in Arabidopsis, and13out of181intronic miRNAs resided within six clusters in rice. The results also indicate that most ofclusters contain polycistronic transcription units derived from different miRNA families,which imply these intronic miRNAs can target different mRNA, and may be involved inextremely complex regulation of genetic networks and pathways. Chromosomal distributionof intronic miRNAs suggests that55%of the rice intronic miRNAs,73%of the Arabidopsisintronic miRNAs might have evolved from putative genome segmental duplication events.(3) In Arabidopsis,79.2%of introns carrying miRNAs were shorter than1kb, and most ofupstream sequences of intronic miRNA hairpin within intron in Arabidopsis were shorter than1kb (91.7%). In rice,83%of them were shorter than3kb, and71.4%upstream sequences ofmiRNAs were shorter than1kb. Furthermore, we also predicted promoters within theupstream sequences of intronic miRNA. Only one promoter in Arabidopsis, and14promotersin rice were predicted. Together, short introns and few predicted promoter sequences indicatethat most of intronic miRNAs have no independent transcription units within the intronicregions in plants.(4) The expression of19and48intronic miRNAs were retrieved in MPSS database forArabidopsis and rice, respectively. The results revealed that the intronic miRNAs aretranscribed in high level. Moreover, the host protein-coding genes were examined in differentstages of development using Genevestigator. Expression profiles of21out of36host genes inArabidopsis, and152out of175in rice were retrieved. The results revealed that the majorityof host genes are expressed in all (or most of) stages of development with higher level. Interestingly,1and26host genes present with very specific expression patterns inArabidopsis and rice, respectively. Expression pattern analysis of host genes suggests that theintronic miRNAs might play an important role in plant development.(5) Using degradome sequencing data, the putative target genes of intronic miRNAs wereidentified. The results showed that some target genes encode important transcription factors,which suggests that the intronic miRNAs might be involved in the regulation of geneticnetworks and pathways.
Keywords/Search Tags:plant, genomic organization, database, promoter, TFBS, transcriptionalregulation, miRNA, intronic miRNA, pri-miRNA
PDF Full Text Request
Related items