Font Size: a A A

Study On The Genetic Sequences Of Medicinal Plants Using The SMRT Sequencing Technology

Posted on:2016-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q S LiFull Text:PDF
GTID:1224330461976744Subject:Pharmacognosy
Abstract/Summary:PDF Full Text Request
There are rich resources of medicinal plants in China; however the relavent study on genomics and transcriptomics is still in its infancy. The lack of advanced sequencing techniques and bioinformatics tools, which are expected to overcome the difficulties caused by the variety and complexity of the genetic background of the medicial plants, is one of the important reasons for this situation. Single Molecule Real-Time (SMRT) DNA sequencing is classified as the third generation sequencing technology. The SMRT sequencing performs parallel, simultaneous detections of thousands of single-molecule DNA sequencing reactions, by detecting the temporal order of the enzymatic incorporation into a growing DNA strand in a zero-mode waveguide (ZMW) nanostructure. The SMRT sequencing is currently the most advanced DNA sequencing technology in the world. Since its advent on 2009, SMRT has helped to solve many important issues in the field of biology and medicine, but the research of applications in medicinal plants still remians blank. This work concentrated on the apllications of SMRT sequencing in the genomics and transcriptomics studies on the medicinal plants. Representative medicinal plants (and fungi) were studied as samples in the hybrid de novo assembly of genome sequences, de novo assembly of chloroplast genome sequences and full-length trancripts sequencing in medicinal plants.In the first part of this study, de novo assembly of whole genome aided by SMRT sequencing, SMRTbell DNA template libraries of medicinal model plant Salvia miltiorrhiza with an insert size of 8kb to 10kb were prepared. SMRT long reads were introduced in the hybrid assembly process based on the exsiting next generation sequencing (NGS) dataset. A long-read-based assembly strategy was used, and ~4 Gb corrected long reads with length> 2000 bp were retained from ~8 Gb SMRT reads. These reads were first assembled alone with Celera before being re-assembled with the contigs from the 454 short read assembly. The resulting assembly contains 60,349 contigs with an N50 length of 12.4kb and a total length of 524 Mb, accounting for ~85% of the genome. Large-insert, mate-pair (MP) reads were added for contig scaffolding, yielding 21,045 scaffolds with an N50 of 51kb and a total length of 538 Mb. The accuracy of nonrepeat regions within the assembly was confirmed by comparison with the completed BAC sequences. The draft genome of S. miltiorrhiza provides the insight into the complexity of this medicinal plant genome, and the fragmented assembly was largely improved by introducing the SMRT long reads.In the second part of this study, a circular consensus sequencing (CCS) strategy involving SMRT sequencing was applied to de novo assembly and SNPs detection of the chloroplast genomes of Fritillaria hupehensis, Fritillaria taipaiensis and Fritillaria cirrhosa. Chloroplast DNA was purified from enriched chloroplasts of pooled individuals to construct a shotgun library for each species. CCS reads were generated from the polymerase reads that passed the native dumbbell-shaped DNA templates multiple times. The complete chloroplast genome sequence was generated by mapping all reads to the draft sequence constructed in a step-by-step manner. The full-chain, PCR-free approach eliminates the possible context-specific biases in library construction and sequencing reaction. The chloroplast genome was easily and completely assembled using the data generated from one SMRT Cell without requiring a reference genome. The three chloroplast genomes of Fritillaria exhibit a typical quadripartite structure, with a size ranged from 151,691 bp to 152,145 bp. Comparisons of the three assembled Fritillaria genomes to 34.1kb of validation Sanger sequences revealed 100% concordance, and the detected intraspecies SNPs at a minimum variant frequency of 15% were all confirmed. The three chloroplast genomes all mapped as circular molecules containing 135 genes. Eight rRNA genes,38 tRNA genes and 18 intron-containing genes were identified. The genes infA, ycf15 and yc68 contained internal stop codons, indicating that they may be pseudogenes. Twenty and 70 putative intraspecies SNPs were detected in F. taipaiensis and F. cirrhosa with a variant frequency of 9.38%-45.45% and 9.60%-50.00%, respectively. Variation within protein-coding genes among the three species was calculated. Phylogenetic analysis of Liliales using chloroplast genome sequencs supported the inference that Melanthiaceae were sister to the remaining families of Liliales, similar to the reported evidence from combined analyses of chloroplastic and mitochondrial loci. The conformation of the exact position of Melanthiaceae still needs more evidence, but the whole chloroplast genome sequeces with abundant genetic information has shown clear advantages in the evolutionary studies. This simple approach shows its powerful applicability for super-barcoding, molecular makers developing, genetic engineering and reconstruction of phylogenetic relationships in medicinal plants based on the sequences of chloroplast genomes.In the third part of this study, isoform sequencing using SMRT technology was preliminarily studied on genome assembled medicinal fungus Ganoderma lucidum. The full-length cDNA were reverse transcribed from total mRNA, then normalizated. The SMRTbell libraries were constructed after size selection. P450 genes were used for the bioinformatic confirmation. The initial results showed that the SMRT sequencing has a well coverage of transcriptome and excellent capacity of capturing full-length trancripts. The isoform information from 5’to 3’implies a large number of alternative splicing events in this medicinal fungus. However, the current accuracy of the SMRT full-length transctripts needs to be improved when performing downstream analysis.To sum up, the genomic and transcriptomic appilications of SMRT sequencing technology were studied based on several important medicinal plants (fungi). Effective workflows and analysis methods were presented, which shows that the SMRT sequencing technology has high value and huge potential use in the field of medicinal plants, and shall also have significant impacts on the selection of cultivars with good agricultural traits, molecular identification and analysis of secondary metabolic pathways of medicinal plants.
Keywords/Search Tags:Single Molecule Real-Time(SMRT) sequencing, medicinal plant, genome, chloroplast, transcriptome
PDF Full Text Request
Related items