Infectious disease afflicts human society for the past tens of thousands of years,and is still the top health-threatening cause for humans in current day, causing over13million deaths each year. During our battle against infectious disease, novel virulent strains,including Escherichia coli O157:H7, Vibrio cholerae O139and multiple drug resistantMycobacterium tuberculosis etc, continually emerging and shaped new threaten for humanhealth.Majority of the emerging highly virulent isolates were raised from microevolution of thepast known pathogen. Microevolution of pathogenic bacteria is a dynamic process ofinteractions between pathogen, host and environment. Pathogenic bacteria continually changetheir genomes, by point mutation, genome rearrangement, gene acquisition and loss, and copynumber variation of repeat sequences. Emergence of strains with new features resulted fromre-modeling process of the bacterial genome. Therefore, genomics sequences comparison ofdifferent bacterial pathogen isolates will promote our understanding on the evolution of thestudied pathogenic bacteria, and the database of the the genomic diversity will be great helpfor the prevention and control of the infectious diseases.The objectives of this study are:1) to collect and integrate the genomic diversitydata of representative important pathogens for establishing the database used in theresearch of infectious disease control and bacterial microevolutionary.2) to build aonline analysis platform for genotyping and phylogenetic analysis of pathogen isolates, aswell as performing rapid source tracing of the pathogens during outbreak of infectiousdisease.Collection and integration of the genomic diversity data of representativeimportant pathogensBy literature review, online database mining and data sharing from collaboratinglaboratories, we collected the genomic diversity data of eight important pathogens,including Yersinia pestis, Brucella, E. coli O157:H7,Mycobacterium tuberculosis,Helicobacter pylori, Shigella,Vibrio parahaemolyticus and Salmonella, for six typesof polymorphism data: MLST (multilocus sequence typing), MLVA (multiple-locusVNTR analysis), CRISPRs (clustered regularly interspaced short palindromic repeats), SNP (single nucleotide polymorphism), DFR (different region) and IS (insertionsequence).Construction and application of the genomic diversity database of importantpathogensAll acquired data could be assigned to three types:1) background information ofinterested isolates of pathogen, including date and location of isolation, biochemicalphenotypes, etc.2) genomic diversity data, including information of MLVA, SNP,DFR, MLST, CRISPR and IS, stored in matrices or Excel files.3) complete genomesequences of pathogenic bacteria strains and their annotation information. Tostandardize and modularize the data we collected, we format it and pre-process withPerl script. Then we designed the ER (entity-relationship) model, selected the free andan open-source DBMS (database management system) MySQL and the database wasimplemented.For microevolutionary analysis and source tracing of important pathogens, we developeda web-based analysis platform based on the MySQL database. The platform contains fourmodules:1) retrieval module: users can browse the data through filed index, or querythe database by setting keywords.2) genotyping module: users can get the matchinggenotypes of their own isolates using genomic diversity data.3) clustering module:users can cluster their own strains with the isolates of the database using genomicdiversity data and the phylogenetic tree will be drawn on the client pages.4) sequencealignment module: users can align their own genome sequence to the pathogensreference sequence and the genomic polymorphism dataset will be acquired.ConclusionsThe genomic diversity database of representative important pathogens coverseight important pathogens, including11,123isolates with background information, sixtypes of genomic polymorphism data, and more than10,000of genomicpolymorphism loci. The database provides not only data browsing, but also aweb-based online analysis platform. During disease epidemics or bioterrorism attack,users can retrieve the polymorphism data and trace the source of outbreak using oursystem. The platform has been successfully applied for source-tracing of2009primary pneumonic plague outbreak in Qinghai Province. |