Font Size: a A A

Transcriptome Sequencing And De Novo Analysis For Half Smooth Tongue Sole (Cynoglossus Semilaevis) And Japanese Flounder (Paralichthys Olivaceus)

Posted on:2015-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J WangFull Text:PDF
GTID:1223330431984552Subject:Marine biology
Abstract/Summary:PDF Full Text Request
Half smooth tongue sole (Cynoglossus semilaevis) and Japanese flounder(Paralichthys olivaceus) are two of the valuable fish for aquaculture in China. Sexualdimorphism, especially the different growth rate and body size between two sexes,makes these two fish a good model to investigate mechanisms responsible for suchdimorphism for both fundamental questions in evolution and applied topics inaquaculture. However, the lack of available-genomic data has hindered the process.The recent advent of high throughput sequencing technology, such as454pyrosequencing and Illumina sequencing, provides a robust tool for―omics‖study ofnon-model species. In this study, de novo transcriptome sequencing for half-smoothtongue sole and Japanese flounder was performed using454pyrosequencing andIllumina sequencing technology, respectively.1) Half smooth tongue soleA total of749,954reads with an average length of235bp were generated using asingle454sequencing run in one full PicoTiter plate, and the total base number was176M. After removal of the adapters, short and low quality sequences, totally584,419high-quality reads, of which the average length is206bpwere maintained. Theseresults showed that77.9%of raw reads contained useful sequence information, whichcould be used for subsequentassembly. The trimmed and size selected reads wereassembled into62,632isotigs with98,262remaining as singlets.Isotigs ranged from100bp to1,665bp with an average length of272bp and an N50of303bp.Sequencing average coverage which determined as the number of reads assembledinto the given contig was10.2. The assembly result of our data was comparable toother fish transcriptomes using454pyrosequencing.Several complementary approaches were utilized to annotate the assembled sequences. Firstly, unigenes were compared against the public protein databases usingBlastx. This procedure successfully assigned gene names for26,589(17.7%)sequences. Of these26,589annotated sequences, only349sequences were annotatedby known information of flatfish, indicating thelack ofinformation of flatfishin thepublic databases. Secondly, sequences that had matches in public protein databaseswere annotated with Gene Ontology (GO) annotation, which provides a dynamic,controlled vocabulary and hierarchical relationships for representation of informationon molecular function, cellular component and biological process. In total,3,451unigenes were annotated with17,113GO terms, of which1,921records wereannotated with a cellular component (GO:0005575),3,020with a molecular function(GO:0003674), and2,561with a biological process (GO:0008150). Lastly, theKEGG (Kyoto Encyclopedia of Genes and Genomes) pathway approach for higherorder functional annotation was implemented using the web toolKASS. A total of2,362unigenes were mapped to186different pathways.A search of our transcriptome data revealed that1,898unigenes contained putativetransposable elements (TEs), of which904TEs belonged to retroelements, and994TEs belonged to DNA transposons. The most frequent retroelements was Gypsy (266,29.4%), followed by Jockey (151,16.7%) and Copia (104,11.5%), while the mostfrequent DNA transposons was CACTA (248,24.9%), followed by hAT (131,13.2%)and Tc1-Mariner (124,12.5%).Putative molecular markers were identified in our dataset. In total,7,869SSRs,21,234SNPs and13,370single nucleotide indels were found. For SSRs, the mostfrequent repeat motifs were dinucleotides, which accounted for64.3%of all SSRs,followed by trinucleotides (31.1%), tetranucleotides (3.5%), pentanucleotides (0.7%)and hexanucleotides (0.4%). Based on the distribution of SSR motifs, AC motifs wasthe most common one (20.2%), followed by CAG and AAAC whichwas the mostabundant motif for tri-and tetra-nucleotides, respectively. For SNPs, there were14,333transitions and6,901transversions. The overall frequency of all SNP types inthe transcriptome, including indels, was1per491bp. Of these SNPs,4,162(19.6%)were identified from contigs with annotation information. These putative SNPs are expected to be useful for genetic and breeding studies in half smooth tongue sole.2) Japanese flounderA lane of2×90Paired-end sequencing produced more than27million raw reads,containing nearly2.5giga base information. After removing low-quality sequences,24M reads with a length of more than25bp were maintained. The clean readsaccounted for88%of the raw reads, and average length of clean reads was75.2, with1.8G bases had a Q value more than20. Two different assemblers (SOAPdenovo andTrinity) were used to assemble clean reads to consensus. By SOAPdenovo, assemblygenerated119,370scaffolds ranged from150to9,339bp with an average length of469bp and an N50of626bp. The total length of scaffolds was56M bases with an Nratio of0.09%,9.41%(11,232) of scaffolds had a length more than1kb. Trinitygenerated97,460contigs ranged from201bp to10,284bp. The average length ofthese contigs was643bp, with an N50of910bp,and11.43%of contigs (16,211) hada length more than1kb. The total length of contigs was62.6M. The merge of two setsof assemblies generated107,318non-redundant sequence with a longer averagelength and N50(646bp and1,081bp, respectively) and a bigger total size (69.4M).Several complementary approaches were utilized to annotate the assembledsequences. Firstly, unigenes were compared against the public protein databases usingBlastx. By similarity search, nearly half of sequences (51,563,48.1%) got proteincoding information. Secondly, GO annotated17,833sequences with37,541GO terms.Thirdly, by KEGG pathway analysis,7,811sequences were mapped to310differentpathways. Interestingly, Pathways in cancer (05200) was the best represented pathwayto which163unigenes were mapped.The protein coding sequence (CDS) wereidentified by similarity serach for unigenes with Blastx annotation, and an additionalof5,516CDS were predicted for unigenes without Blastx annotation.Bysearchingagainst RepBase database,a total of11,021potential transposable elementswere identified, including5,380retroelements and5,641DNA transposons.Trinity assembler provides a tool to analyze alternative splicing events intranscriptome.6,941unigenes may exist potential alternative splicing events. Four often randomly selected unigenes were verified by Sanger sequencing. Using of a doubled haploid Japanese flounder individual allows us to analyze the gene duplicateevents. Totally, we identified1,859unigenes which represent potential duplicatedgenes in Japanese flounder genome.
Keywords/Search Tags:Half smooth tongue sole, Japanese flounder, transcriptome, 454pyrosequencing, Solexa sequencing
PDF Full Text Request
Related items