Font Size: a A A

Mining And Application Of Molecular Markers From EST Database And Transcriptome Sequencing In Tea Plant (Camellia Sinensis)

Posted on:2012-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y WangFull Text:PDF
GTID:1103330335979304Subject:Tea
Abstract/Summary:PDF Full Text Request
There is little genetic and genomic information available in tea plant (Camellia sinensis), especially effective DNA markers. In this study, the existing public EST database was used to exploit SSR and SNP markers. Moreover, transcriptome information through high-throughput RNA-seq in tea flower was obtained and was used to exploit SSR sites and markers. These results of this study are sumarrized as follows:(1) By clustering the 12,757 ESTs of tea downloaded from NCBI, a unigene database of tea, containing 4,000 unigenes, was successfully built. It was found that the redundancy rate for ESTs from tea was approximately 68.2 %. Meanwhile, the characteristics of SSR distribution were also explicated. 206 pairs of SSR primers were designed by Primer 5 and 59 polymorphism SSR primers were found.(2) Both sampling strategy for genetic diversity of tea landraces and the genetic diversity and differentiation Longjing tea landrace were study, as the utilizations of the SSR primers exploited above. It was found that the mostly suitable genetic diversity parameter for the sampling of tea landrace was the number of alleles NA and when the number of alleles per SSR locus was 5, at least 24 individual tea plants were needed to reach 90% of the total genetic diversity; the level of genetic diversity within Longjing tea landrace was high. The average PIC polymorphism information content (PIC) was 0.4382. 33.3 % SSR sites in this study were classifid to high polymorphism and 62.5 % were medium polymorphism. Hardy—Weinberg eguiliberum (HWE) test displayed that 66.7 %SSR sites were not in accordance with the HWE. AMOVA analysis showed that the genetic differentiation between five populations of Longjing tea landrace was low.(3) An EST-SNP exploiting system for tea was established preliminarily. The SNP distribution was identified. The occurrence frequency for coding region SNP in tea was appropriately 0.58 %. It meant that there was averagely one SNP in 200bp in tea ESTs. Furthermore, the hybrid rate for tea genome was deduced to be 0.38%, averagely one hybrid DNA site per 300 bp. 818 candidated SNP were exploited from 237 multigene clusters. Then 25 pairs of SNP primers were designed and 75 % of these sites were validated to be polymorphism by DNA sequencing.(4) Using high-throughput Illumina RNA-seq, the transcriptome from RNA of the flowers of Camellia sinensis was analyzed and 75,531 unigenes were obtained. The average depth and coverage for sequencing was 23.45 and 0.895 respectively. Distribution of RPKM value of all unigenes was analyzed and found that the genes with low and medium expression level were dominant in gene expression pattern of tea flowers.Sequence similarity analyses of four public databases (NR, COGs of NCBI, InterPro, KEGG) found 55,088 unigenes that could be annotated.(5) The SSR sites in the transcriptome from RNA of the flowers in Camellia sinensis were exploited with high-through. There were 12,582 SSRs present in 10,290 unigenes, the occurrence frequency of SSRs was 16.66 %. 340 SSR motifs were founed and dinucleotide repeats were the most abundant (44.99%). The length distribution of SSRs was seriously deviated from the normal distribution. The number of short sequence SSRs with length below 15 bp was maximum; the SSRs with length above 30 bp were in small proportion.(6)Automatically, 2,633 pairs of SSR primers were designed. 42.85 % of SSR sites were successfully used for primer design.These methods were efficient in functional gene discovery and useful for molecular marker-assisted breeding of tea.
Keywords/Search Tags:Camellia sinensis, EST, Bioinformatics, Molecular Markers, RNA-Seq
PDF Full Text Request
Related items