| This thesis aims to analysis the protein-coding gene and noncoding RNA in plants from bioinformatics perspective. We employed bioinformatics methods to study the plant-specific TIFY family, relationship between protein coding gene duplication modes and RNA directed DNA Methletion (RdDM) pathway, and long noncoding RNA-mediated RNA regulatory network.A systemic analysis of origin and evolutionary relationships of TIFY gene families among different plant species is missing. After exhaustive genome-wide searches against14genomes, TIFY transcription factors were identified and classified into four subfamilies TIFY, PPD, JAZ and ZML according to their different domain architectures. Results showthat the TIFY domain of the ZML subfamily possesses a core "TLS[F/Y]XG"motif rather than the "TIFYXG"motif that is dominant in the other three subfamilies. A comprehensive survey of the TIFY family allowed us to discover a new group within the JAZ subfamily and to identify several novel conserved motifs via phylogenetic analysis. Evolutional analysis indicates that whole genome duplication and tandem duplication contributed to the expansion of the TIFY family in plants.Whether gene duplication modes related to the main parts of the RdDM pathway has yet to be reported. Here, we identified duplicated genes in the Arabidopsis genome and divide all genes into WGD, tandem, proximal, transposed duplicates and singleton genes. The number of transposable element is differentially associated with protein genes of all kinds of gene duplication modes. The siRNA loci are more likely to located on the single duplication genes than the whole-genome duplication genes. For pairs of duplicated genes, different types of gene body methylation (CG and CHG/CHH) show distinct patterns associated with different gene origins and duplication modes in Arabidopsis.Long non-coding RNAs (lncRNAs) have been demonstrated as a new layer of RNA regulation in the transcriptome of mammalian genome, playing a role in regulating gene expression in diverse biological processes. A systematic analysis of lncRNAs expressed in Arabidopsis thaliana were conducted by integrating deep sequencing data of RNA seq, ChIP-seq and degradome. Here, we identified390lncRNAs in A. thaliana by mining strand-specific RNA-seq data. Gerome-wide analysis reveals the properties of lncRNAs including genomic feathers, high GC content and low conservation. In A. thaliana, a few lncRNAs are supported by chromatin signatures of H3K4me3and/or H3K36me3. Computational prediction and degradome-based analysis indicated that two lncRNAs are targeted by miRNAs. Further, we predicted that most lncRNAs can form natural antisense transcripts with protein coding transcripts as well as other lncRNAs. We constructed a RNA regulatory network mediated by novel lncRNAs in Arabidopsis, shedding a light on the regulation rule of lncRNAs among diverse RNA molecules. |