Font Size: a A A

Improvement Of Insect Genome Annotation Method And Analysis Of Two Insect Genomes

Posted on:2015-07-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:J D LiuFull Text:PDF
GTID:1220330482970739Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Genome contains all genetic information of an organism, which is the basis for understanding and modifying an organism. Therefore, genome sequencing is an important task for biology research. Since genome sequencing was a high-cost and time-consuming job, only several model organisms were sequenced for their genome such as Homo sapiens, Drosophila melanogaster and Mus musculus. Insects are the most abundant animal species on earth. They have very important impacts on human living and agricultural production. As the high-throughput sequencing technolgy developed, the cost of genome sequenceing decreased dramatically. Insect genome sequencing has become possible. Here, we developed and optimized an insect genome annotation method, and analyzed the genomes of C.suppressalis and M.cingulum. The main conclusions are as follows.1. Optimization of insect genome annotation method.Because of the high complexity of insects, the genome can not be well assembled from the data obtained by the second-generation sequencing technology. In order to find reliable gene information from the low-N50 draft genome, we designed the OMIGA (Optimized Maker Based Insect Genome Annotation) pipeline to annotate insect genome. The C. suppressalis genome with low N50 was used to to assess the performance of OMIGA.First, hundreds of reliable and full-length coding genes were found from RNA-seq assembly, and then were used to retrain denovo gene prediction software to improve the accuracy of prediction software. Second, enough transcription evidences were abstracted from RNA-seq data, and so overcomed the lack of transcription evidence. Third, three kinds of evidence from denovo prediction, transcription prediction and homologous prediction were combined to generate consensus genes. Last, we designed and compare four genome annotation strategies to annotate the low N50 C. suppressalis genome. Four results of four annotation strategies suggest that OMIGA has best performance.2. Genome assembly, annotation and analysis of C. suppressalisThe rice stem borer (SSB), C. suppressalis, is widely distributed in the tropics of Asia, and frequently causes serious damage in rice production. The genome of C. suppressalis is very useful to investigate this pest.(1) Four libraries with insertion size of 190bp,380bp,500bp and 700bp were built and sequenced. In total,20.44Gb data were obtained. The assembly results by SOAPdenovo, SOAPdenovo2 and AbySS were similar. The best N50 of the assembled scaffolds is 5.2kb. A 17-mer analysis showed that the genome size is 824Mb, and the heterozygous ratio is 1.5%. GC content of assembled genomeis 35.78%.(2) The CEGMA analysis suggested that the low-N50 draft genome contains 76% of genes, with 48% in full-length. We used the OMIGA to annotate the low-N50 assembly of rice stem borer. In the low-N50 draft genome, OMIGA found 10,211 protein coding genes among which,9,720 genes share homolous with B. mori. The promoters of 5,651 genes can be found from the SSB draft genome.(3) In the SSB genome, we found 1,342 alternative splicing events involving in 1,167 genes (about 12.2% of total genes), which less than D. melanogaster (-70%). The difference of alternative splicing between two insects may be caused by the low-N50 of the genome and low abundance of the transcriptome data. In the 1,342 alternative splicing events,3’alternative splicing events,5’alternative splicing events, exon skipping events and intron retainning events accounts for 42.39%,25.39%,17.84% and 14.38% respectively.(4) Using a small RNA library, we found 262 microRNAs with the assistance of the SSB draft genome. Among which,45 microRNAs are novel,217 microRNAs are conserved in metazoan, which suggests that the low-N50 genome has an important role in microRNA identification.(5) In the SSB draft genome,126 cytochrome P450 monooxygenases (CYPs) were identified. The number of CYPs in C. suppressalis is similar to 135 CYPs of T.castaneum, more than 82 CYPs of B. mori and 75 CYPs of D. plexippus. In addition, two CYPs, CYP314A1 and CYP4M7, which confers insecticide resistance were identified.(6) Twenty-nine OBPs,12 CSPs and the core genes of RNAi pathway including AGO, Aubergine, piwi, exp-5, PARP, dicer-1, dicer-2 and sid-1 were found in the SSB draft genome. Genome level analysis shows that the SSB does not have sid-2, which may suggest that the SSB has different RNAi mechanism compared with C.elegans.3. Genome assembly, annotation and analysis of M.cingulum The braconid wasp, M. cingulum, is a polyembryonic wasp. This wasp specifically parasitizes on the larvae of Asian corn borer, Ostrinia furnacalis. The wasp genome provides new insights into understanding the parasitic behavior and genetic features.(1) We built three libraries with insertion sizes of 180bp,500bp and 800bp, and a library with large insertion size of 8Kb. After removing adapters and filtering low quality sequences, total clean data of 103.67Gb were obtained. We used several algorithms to assemble contigs, build scaffolds and fill gaps. Finally, we obtained 132Mb genome sequences. The scaffold N50 was 192Kb, and the contig N50 was 64Kb. According to CEGMA assessment, the genome assembly covers 99% of protein coding genes.(2) Two parasitical wasps, M.cingulum and N.vitripennis, have higher GC content than A.mellifera. Two parasitical wasps have similar distribution patterns of GC content. Repeat sequences analysis suggests that M. cingulum repeat sequence account for 24.9% of genome, which between N. vitripennis (42.1%) and A. mellifera (13.6%). (3) From M. cingulum genome,12,593 coding genes were identified. Compared to A. mellifera and N. vitripennis, we found that M. cingulum genes have the least exon number, the shortest intron and the longest exon. The gene structures of M. cingulum are compact. This may be one reason of why M. cingulum has the smallest genome. Phylogenetic analysis using the protein coding genes of 15 species suggests:1) The divergence of Hymenoptera occurs between Lepidoptera and dipteral; 2) In Hymenoptera insects, Terebrantia is more divergent than Aculeata; 3) Ichneumonoidea are closer to Apoidea than Chalcidoidea.(4) We identified several important gene families. Nine odorant binding proteins (OBPs),82 odorant recptors (ORs), five chemosensory proteins (CSPs),26 gustatory receptors (GRs) and 33 ionotropic receptors (IRs) were discovered. Three gene families related to detoxification including 33 CYPs, nine glutathione S-transferases (GSTs) and 28 carboxyl/cholinesterases (CCEs) were identified from the M. cingulum genome. Gene numbers of OBPs, ORs, CSPs and CYPs of M. cingulumzre are significantly less than N. vitripennis, which may be relative to the difference between their diets. Gene faimlies involved inhunting for hosts and detoxifcation are beneficial to investigation of parasitic behaviour and biological control.(5) We found 21 venom protein genes in M. cingulum, much less than 71 and 27 in A.mellifera and N.vitripennis respectively. Homology analysis suggests that M. cingulum shares more homology protein groups with N. vitripennis than A. mellifera. Venom protein investigation provides clues for investigation of the role of venom proteins in parasitical attack and defense.(6) Genes involved in sex determination, including dsx, ix, msl-3, dpn, mle, emc, mof, run, sc, Trl, Tra and Tra2 were found from the M. cingulum genome. However, csd gene was not found in M. cingulum and N. vitripennis, indicating that sex determination mechanism of parasitic wasps is different from A. mellifera. All three Hymenoptera insects have tra gene, but Drosophila does not have it, suggesting that Tra gene may specifically functions in hymenoptera sex determination.(7) Some pathways such as N-glycan biosynthesis, O-glycan biosynthesis, glycan degradation are thought to participate in immune evasion of parasitic wasp. By comparing these pathways among D. melanogaster, M. cingulum, N. vitripennis and A. mellifera, we found that lacZ gene appears only in the parasitic wasp, and some genes including E3.2.1.24, AGA, FUT13, FNG and OGT appear only in the M. cingulum. These finding may offer valuable hints for the investigation of immune evasion.(8) Polyembryonic development may associate with some pathways including cell adhesion molecules, adherens junction, tight junctions and gap junction. In M. cingulum, we found a gene duplication of integrin-β in cell adhesion molecules pathway, which is thought to participate in the regulation of polyembryonic development. In addition, SMAD23 of adhesion junction pathway, four genes (SYMPK, KRAS, EXOC4, ACTBG1) of tight junctions, four genes (HRAS, TUBA, TUBB, PRKG) of gap junction pathway have significantly high expressions in egg stage. These genes may partipate in polyembryonic development regulation.
Keywords/Search Tags:Chilo suppressalis, Macrocentrus cingulum, Genome sequencing, Genome assembly, Genome antotation, Comparative genomics
PDF Full Text Request
Related items