Font Size: a A A

Construction Of Silkworm Genome Database And It's Application

Posted on:2009-12-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:J DuanFull Text:PDF
GTID:1100360242497023Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Silkworm,Bombyx mori,is an economically important insect and a model for Lepidopteran insects.The implementation of silkworm genome sequence project will benefit many aspects.On one hand,it will facilitate the basic research of silkworm in physiology,biochemistry and metabolism,and reveal the molecular mechanism about silk production.These advances will provide foundations for reforming traditional sericulture through modern biotechnology.On the other hand, it will also provide the new methods for controlling pests in agriculture and forest.In addition, researchers also give attention to the study that regards silkworm as biological reactor.Following the human genome project and other genome projects for model organisms,a 3x and a 6x draft genome sequences of silkworm have been completed by Japanese group and Chinese group,respectively.However,the draft genome sequences are not intact,and a part of genes are fragmentary.In order to acquire a relative complete genome sequence,Japanese group and Chinese group cooperated together,and focused on exchanging their data,filled up genomic gaps and developed more molecular markers.In 2007,a fine genome sequence of silkworm was successfully assembled.The completion of the fine assembly of the silkworm genome sequence lays a foundation for studying function of genes.However,the main problem that people faced is that how to conveniently acquire the genome data to obtain the information or clue related to genes.To solve this problem,we predicted the functions of all the silkworm genes in many ways and obtained gene expression profiles by microarmy.Based on these data and other related data,we successfully reconstructed a new silkworm genome database.This database integrated various data resources and bioinformatics tools and will provide a useful platform for silkworm and Lepidopteran insect research community.In addition,the C2H2 zinc-finger protein genes were identified based on silkworm fine genome sequence and database.The main results are as follows.1.Annotation of function of genes in silkworm The functions of genes were predicted by a variety of methods.The information will provide some important clues for further studying gene function.(1)Gene function predicted by sequence similarity searches:This method is mainly based on that similar sequences often share with similar functions.A total of 14623 genes were used to query against the nr database in NCBI.As a result,12246 genes have homologs(E-value<1E-5),which account for 83.7%of all genes.Of which,there are 5250 genes that are highly conserved (E-value<1E-80).These conserved genes are involved in DNA replication,energy metabolism, protein synthesis,lipid metabolism,metabolism of carbohydrate and other basic physiological processes.In addition,there are 2377 genes that have no homolog.It is likely that these genes belong to silkworm-specific genes,so we suppose that these genes may be related to some special physiological functions of silkworm.(2)Gene function predicted by protein domains.The function of gene mainly is referred to as the function of protein that it is encoded.Protein domains play very important role when a gene works.Thus,the information of protein domains in genes will give clues for gene function.All the silkworm genes were used to query against the InterPro database.As a result,there are 8522 genes that have domains,which account for 58.2%of all genes.There are 2509 kinds of domains in sum, and the most prevailing domains are C2H2,LRR1,WD40,Ank,I-set and so on.On the one hand, this work can complement the shortage of sequence similarity search.78 genes were annotated by this way but wasn't by the homology search.On the other hand,the gene function obtained by domains may give more comprehensive information,especially for the genes that contain multi-domain.(3)Gene function prediction based on the database of COG:Gene function was predicted based on the database of COG(Clusters of Orthologous Groups).The result showed that 7839 genes could be classified to the corresponding orthologous clusters(E-value<1E-5).Among them,gene groups related to "General function prediction only","signal transduction mechanisms","Posttranslational modification","protein turnover and chaperones" and "Lipid transport and metabolism" are most enriched clusters;they include 1602 genes,987 genes,593 genes,436 genes,and 391 genes, respectively.In addition,we also annotated the genes through the database of LSE(Lineage Specific Expansion).The result showed that 533 genes could be classified to the corresponding LSE clusters (E-value<1E-5).Among them,there are 475 genes of silkworm belong to Drosophila lineage-specific gene groups.This indicated that these genes are insect-specific genes,and may be related to insect specific physiology.Comparison analysis showed that there are 6580 genes that can be annotated by the above three methods.These methods have the relative merits respectively.Combination of the above methods could reflect the function of genes more comprehensively.2.Analysis of microarray data of silkworm and database constructionSpatio-temporal expression of genes could control development,differentiation,cell cycle, senescence,programmed cell death and other processes.To acquire more information about gene expression at the whole-genome scale,our lab and National Engineering Center for Beijing Biochip Technology worked together and constructed the first genome-wide oligonucleotide microarray of the silkworm.This microarray was used to detect gene expression of 10 tissues/organs,including midgut,integument,head,hemocyte,testis,ovary,anterior/median silk gland(A/MSG),posterior silk gland(PSG),fat body and malpighian tubule.In this study,we performed analysis of these microarray data and confirmed the reliability of the results.We made the silkworm microarray data to be accessed for public through our database.Analysis of microarray data showed that 10393 genes were detected at least in one tissue,which account for 44.5%of all genes.There are 306 genes that express abundantly in each tissue,and most of these genes belong to house-keeping genes,such as ribosomal protein genes,tubulin genes, translation elongation factor genes,actin genes and so on.Data analysis showed that there are at least1642 gene belonged to tissue-specific expression genes.Most of these genes are distributed in testis,midgut and malpighian tubule,which have 1104,216 and 110 tissue-specific genes, respectively.Based on the annotation of genes,it is shown that these tissue-specific genes are related to the function of corresponding tissue.Data analysis also suggested that at least 209 genes belong to co-expression genes between tissues.These genes reflect that there is similar physiological function or cellar component between tissues.We performed many analyses with different approaches,such as bioinformatics and experiemnt, to validate the reliability of microarray data.The results suggest that these microarray data are reliable and analysis methods used in the study are proper.At last,we constructed a silkworm microarry database of BmMDB based on microarray data so that expression profile of silkworm genes could be conveniently accessed.The information about gene expression will facilitate further study for functions of genes.3.Construction of silkworm genome databaseAfter the completion of the updated assembly of the silkworm genome sequence,the quality of genome sequence has been greatly improved and genes were more accurately predicted.At the same time 87.4%of genomic sequence could be anchored on the chromosomes.In order to provide the accession for these data and integrate more information,we reconstructed silkworm genome database.The new database of SilkDB can be accessed at http://silkworm.swu.edu.cn/silkdb or http://silkworm.genomics.org.cn.In the new database,the information is navigated by genome browse of GBrowse insteading Mapview that was used in previous database.Based on GBrowse,users could access any region on the genome.The database also provides a variety of search methods.One way is keywords,or gene ID and so on.Another way is homologous search by using the tool of BLAST to search against ESTs sequence,genome sequence and gene sequence.In addition,we have developed Silkworm Chromosomes Browser(SCB)and SilkMap to make it easy for people to visit the resources of silkworm data.The Gene Page is the heart of silkworm database.The gene page could display the detailed gene information,such as domain information,GO classification information,annotation of homology searches,gene family,gene expression information,reference information,gene sequence, and so on.This information is the base of the further research for gene function.The new information could be easily added in the current database when it appears.In the next step,we will curate the error data in the database and add more experiment data,such as expression information obtained from SAGE,the phenotype that resulted from RNA interference(RNAi)or gene mutation.In a word,the construction of silkworm genome database will play an important role in accelerating the research of functional studies of genes in silkworm.4.Identification of C2H2 ZFPs in the silkwormThe C2H2 zinc-finger domain has the character of sequence-specific DNA binding.Proteins that contained this domain are called as C2H2 zinc-finger proteins(ZFPs).Generally,most C2H2 ZFPs could function as sequence-specific DNA-binding transcription factors,and play important roles in the process of development,cell differentiation,metamorphism,and so on.By searching the silkworm genome with a HMM model of C2H2 zinc-fingers(PF00096),we have systematically identified 338 C2H2 ZFP genes in silkworm genome,which constitute 2.3%of the annotated genes. Compared to Drosophila,silkworm has significantly more C2H2 ZFP genes and C2H2 zinc-fingers. Further study showed that silkworm has more genes that contain 10 zinc-fingers per gene.C2H2 ZFPs often have other domains other than C2H2 zinc-finger.These domains are named as zinc-finger associated domain,and may assist ZFPs in activating or repressing expression of target genes.In Bombyx mori,there are 90 genes with zinc-finger associated domains.Of them, ZAD is the most prevalent domain,and there are 50 ZADs in 50 genes.Comparative analysis showed that there is no ZAD in the Caenorhabditis elegans,and only one ZAD in the human.This result indicated that ZAD is one domain that has been lineage-specifically expanded in insects.We speculated that ZAD domain may be related to some special physiologic or metabolic processes in insects.The distribution of C2H2 ZFP genes in the genome was investigated.The results showed that a total of 324 C2H2 ZFP genes could be located on chromosomes.About 241 genes are concentrated into 59 tandem duplication clusters(threshold sets as 500kb for neighboring genes).The largest cluster was located on chromosome 24,which consists of 43 C2H2 ZFP genes in 650kb fragment. Most of the ZFP genes are tandem clustered on chromosomes,indicating that tandem duplication plays an important role in expanding the number of these genes.At the same time,the cluster organization also results in an asymmetric distribution of these genes on different chromosomes. Most of C2H2 ZFP genes are concentrated on chromosome 11,15 and 24,and the sum of these genes account for 38.8%of the total C2H2 ZFP genes.The information of gene families is helpful to understand the function of genes.Compared with the C2H2 ZFPs of H.sapiens,C.elegans and D.melanogaster,silkworm C2H2 ZFPs were classified into 75 gene families,and 63 of which belong to evolutionarily conservative families,e.g. they have members from D.melanogaster,C.elegans or H.sapiens.In the evolutionarily conservative families,there are 32 families that have members only from silkworm and D. melanogaster,this indicated that these genes belong to insect specific genes.In addition,there are 12 families that appear only in silkworm.Considering the singleton genes,there are 188 silkworm species-specific C2H2 ZFP genes.However C.elegans,D.rnelanogaster and H.sapiens have only 120,125 and 160 species-specific genes,respectively.This suggests that silkworm has more species-specific C2H2 ZFP genes than other organisms.Silkworm has the character of silk production and metamorphism,so the further studies of these genes may uncover the relationship between these silkworm species-specific genes and the species-specific biological processes.Day 3 of fifth instar is an important stage for silk protein synthesis and preparation for metamorphism.In this study,we examined the expression patterns of the silkworm C2H2 ZFP genes in different tissues based on microarray data.As a result,a total of 132 C2H2 ZFP genes were detected that express at least in one of the investigated tissues.Of which,33 genes express in every tissue,and 14 genes express exclusively in one investigated tissue.The results indicted that these genes may play important roles in this stage.For example,for the genes that expressed in all the tissues,BmZFP286 belong to DNJA5 family,so this gene may be related to the protein fold of this stage.BmZFP104 belong to Ab family,and we speculated that this gene may have the function of coordinate the movement of tissue or organs at this stage.BmZFP160 shared high similarity with Drosophila crol.We speculated that this gene may be an early response gene for ecdysone,and it is likely that this gene has been induced by ecdysone. In this study,we have identified C2H2 ZFP genes in the silkworm,and acquired the basic information for these genes,such as gene distribution on chromosome,gene family information and expression information.This information will be useful for further functional studies on these genes.
Keywords/Search Tags:silkworm, genome, gene annotation, gene expression, zinc-finger protein, database
PDF Full Text Request
Related items