Font Size: a A A

Transcriptome Sequencing And Differential Expression Analysis Between Gossypiumaustrale And G.Arboreum

Posted on:2014-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:T TaoFull Text:PDF
GTID:2253330428459675Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Cotton is globally appreciated and considered as a remarkable economic crop due to its contribution to provide human beings with natural textile fiber. It is also widely used as oil crop. The cottonseeds possess an oil content of45%and the proteins within cottonseeds are very rich at the same time. Cottonseeds can be used as oil resources, protein supplies, food and feed. It is commercially valuable due to its high protein and oil contents. However, the gossypol within cultivated cotton limits the usage the cottonseeds directly. Gossypol is considered toxic to humans and non-ruminant animals. Interestingly, some of the Australian wild cotton species possess a unique character, known as the "delayed gland morphogenesis". The dormant seeds of these species contain no gossypol and glands, it is only after germination that the glands start to appear and gossypol begins to synthesize. It provides brand new ideas for the breeding of low-gossypol content cotton cultivar and may be the key solution to the massive usage of cottonseeds directly. Our study focused on two diploid cotton species, i.e. Gossypium australe and Gossypium arboreum. Three germination stages were chosen to analyze with next-generation sequencing technology using Illumina HiSeq2000Platform.After data pre-processing, the obtained reads were aligned back to the G. raimondii genome sequences using TopHat2and analyzed according to the genome-guilded strategy. Results showed that the perfect aligned ratio was relatively low due to genome differences. Keeping using the genome-guilded strategy may course certain loss of unique A-and G-transcripts. Thus, we chose to further analyze the data by applying de novo strategy.The cDNA libraries of three different stages were pooled together for Trinity assembler to represent the whole transcriptome during germination for both G. arboreum and G. australe. With the purpose of detecting differential expressed genes between G. arboruem and G. australe during germination, the six cDNA libraries were also assembled together as a reference transcriptome using Trinity. The three data sets corresponding to G. arboreum, G. australe, and G. australe&G. arboreum were assembled into226,184,213,257,275,434transcripts, respectively, clustering into61,048,47,908,72,985individual clusters. The N50s of the unigenes are remarkably high, achieving1,710,1,544, and1,743, respectively.The three assembled ungiene sets were first used for homology search against Uniprot/Swissprot, Uniprot/TrEMBL and NCBI RefSeq Plant protein databases using BLASTx algorithm. The unigenes were also searched against the CDS and protein sequences within the G. raimondii genome project using BLASTx and BLASTn, respectively. We combined the annotation results from all three protein databases and obtained21,987("A"),17,209("G") and25,325("A&G") unigenes with BLASTx hits. The annotated unigenes were then assigned to Gene Othology (GO) terms for functional classification.18,766(85.4%,"A"),14,552(84.6%,"G") and21,374(84.4%,"A&G") of the annotated unigenes could be assigned to one or more GO terms.A total of13,884differentially expressed genes were identified. Clustering and function classification analysis revealed that these genes were enriched mainly in the biosynthesis of secondary metabolite, binding and catalytic activity, lipid metabolic process, carbohydrate metabolic process etc. We found that many genes within the mevalonate (MVA) and MEP/DOXP pathways showed complete opposite regulation patterns. They showed up regulated in G. australe and down regulated in G. arboreum during germination.To further explore the potential transcription factors related to gland formation, the fine mapping results of G12e gene was used. We extracted the sequences of D-genome between two published markers, i.e. NAU2251b and CIR362, and using FGENESH+to predict gene models.137ORF were found and BLAST results showed that29ORFs may be transcription factors.23among them can be found within the RNA-seq results.Terpene synthases (TPSs) play important roles in cotton secondary metabolic process. Cadinene synthases were mainly responsible for gossypol development and thus were greatly studied. However, seldom do we have informations about other cotton TPS genes. We used the D-genome sequences to find all genes within the TPS family in G. raimondii. Two of the Pfam seed files, i.e. PF01397and PF03936were downloaded and searched against D-genome peptides and81TPS related sequences were obtained. The TPS genes of A-and G-transcriptomes were obtained using the same methods.12(A) and9(G) TPS related sequences were obtained. We further downloaded TPSs of other plant species and together obtained198TPSs gene sequences. Phylogenetic analyses were carried out against these TPSs genes. Results showed that cotton TPSs were distributed in all TPS subfamilies except TPS-d, which belongs to the gymnosperm specifically.Q-PCR validation results showed highly consistent with RNA-seq. Tissue specific expression analyses showed that one of the highly expressed TPS genes was specifically expressed in roots, and showed low expression levels in leaves, stems and10DPA ovlues. We sequenced the transcriptome of wild cotton specie using next-generation sequencing technology. G. australe was chosen for its delayed gland morphogenesis. Comparative genomics and transcriptome sequencing analyses were applied to illustrate the transcripts-level regulation during seed germination of G. australe and G. arboreum. Our results provide powerful resources and evidences to the understanding of gossypol development and gland formation. Meanwhile, the transcriptome data is useful to help and guide the assembly of A genome and cultivated tetraploid cotton genome.
Keywords/Search Tags:Gossypium australe, Gossypium arboreum, gland, gossypol, germination, RNA-seq, terpene synthase
PDF Full Text Request
Related items