Font Size: a A A

Maize Genome Assembly And Rice Long Noncoding Rnas Identification And Analysis Based On High Throughput Sequencing

Posted on:2018-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W XuFull Text:PDF
GTID:1313330515987885Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
With technological development and sequencing cost reduction,high-throughput sequencing technologies are now widely applied to various biological studies.Maize and rice are important food crops and model species.Based on high-throughput sequencing technologies,we assembled and analyzed Mo17 and Zea mays spp.mexicana genome.In addition,we identified long-noncoding RNAs in rice.The main results are as follows:1.Mo17 and Zea mays spp.Mexicana genome assembly.Based on genetic design,2.04 Gb Mo17 and 1.20 Gb mexicana draft genome,with scaffold N50 s of 3 Mb and 107 kb,were assembled with the meta-assembly strategy.Then we used BAUSCO,CEGMA,the highly conserved core gene families(coreGFs)and the annotated protein-coding genes in B73 genome to assess the quality of Mo17 and mexicana genome.Although the assembly of Mo17 and mexicana draft genomes are inferior to the B73 genome,the protein-coding regions are acceptable.There are 79.7% and 72.8% transposable elements in Mo17 and mexicana genomes.By combining the ab initio and evidence-driven gene predictions results,a total of 40,003 and 31,387 high-confidence protein-coding gene models were predicted in Mo17 and mexicana.A 27M(B73 segment size)inversion was identified between B73 and mexicana.With comparing gene co-linearity between B73/mexcana and rice,it is observed that the mexcana genome state is close to ancestral state.We also identified the positive selection genes and gene flow regions between maize and mexicana.310 positive selection genes were found,of which 133 were in mexicana.Some positive selection genes in mexicana may be related to adaption to plateau environment,such as Zmex05g020691 and ZMex05g017761 may have the functions to tolerate the drought and cold stress.Up to 10.7% maize genome showed evidence of gene flow from mexicana in this study.2.Functional analysis of long intergenic non-coding RNAs(linc RNA)in phosphate-starved rice using competing endogenous RNA network.A total of 3170 lincRNA loci,containing 3441 transcripts,were identified in rice.By comparing the basic characteristics between lincRNA and protein-coding gene,we found that the lincRNAs generally contained relatively low GC content.The average transcript length of lincRNAs were shorter than protein-coding genes.However,the average exon length of lincRNAs were longer than protein-coding genes.Based on the theory of competing endogenous RNA(ceRNA),we separately constructed the root and shoot ceRNA network.A total of 4847 nodes(511 lincRNAs)with an average degree of 13.12 were included in root ceRNA network,and shoot network contained 4979 nodes(376 lincRNAs)with average degree of 25.57.Enrichment analyses showed that most of the communities in the networks were related to the biological processes of Pi starvation.The function of lincRNAs were annotated through community information.There were 121 and 164 lincRNAs annotated in root and shoot,respectively.In combined with differential expression information,47 and 40 key lincRNAs were identified in root and shoot,respectively.Among the roots,there are four lincRNAs functions that are annotated for phosphorus stress.By enrichment analysis of all key lincRNAs in root and shoot,we found that key lincRNAs have tissue,temporal and spatial expression specificity.3.Identification and analysis of long-noncoding RNA(lncRNA)and antisense RNA(asRNA)in Zhenshan 97(ZS97)and Minghui 63(MH63).By using RNA-seq data,a total of 8579 lncRNA(17192 isoforms)and 1818 asRNA(4276 isoforms)loci were identified in ZS97,while 8117 lncRNA(17683 isoforms)and 1984 asRNA(4427 isoforms)loci were identified in MH63.By analyzing of the basic characteristics of lncRNAs and asRNAs,we found that lncRNAs and asRNAs have lower GC content,shorter transcripts,fewer exons and lower expression level compared with protein-coding genes.Through the conservative analysis of lncRNA and antisense RNA between ZS97 and MH63,we have found that there were 3473 lncRNA and 847 asRNA sequences are similar in ZS97 and MH63,accounting for 40.48% and 42.78% of overall genome counts.There were position consistent 1507 lncRNAs and 435 asRNAs were found between ZS97 and MH63,accounting for 17.57% and 18.57%.
Keywords/Search Tags:high-throughput sequencing technologies, rice, maize, genome assembly, positive selection of genes, long-noncoding RNAs, competing endogenous RNA network
PDF Full Text Request
Related items