Font Size: a A A

In Silico Analyses Of The Strain-specific Regions In The Complete Genomic Sequences Of Klebsiella Pneumoniae

Posted on:2017-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q WenFull Text:PDF
GTID:2404330590488959Subject:Bio-engineering
Abstract/Summary:PDF Full Text Request
In this study,we analyzed the genomes of the 48 strains of Klebsiella pneumoniae by reconstructing and updating the mGenomeSubtractor software on the high performance computer clusters.Relationships between the relative bacteria and the comparative analysis between bacterial sequences and that of Human Microbiome Project(HMP)were carried.We identified the specific regions of Klebsiella pneumoniae,as well as the drug-resistant genes and virulent genes.First,we re-constructed and updated the mGS(a web-based tool for parallel in silico subtractive hybridization analysis of multiple bacterial genomes)on the Sugon computer clusters of our laboratory.The improved tool provided online services.It met the rapid alignment requirements of individual bacterial genome(2-10 MB level)and data of Human Microbiome Project(1-10 GB level).The mGS could rapidly identify the specific regions of strain,such as bacterial genomic island.In order to improve the capacity of calculating of the tool towards the big data comparison,we upgraded the program codes and updated the computing architectures on the server.This update made the tool finish the comparison between a single bacterial genome and the HMP data with 10 GB level in a short time.In our work,we mainly used the HMP data.The data source is the HMP DACC data browser,including the data from healthy human subjects and demonstration project subjects.The HMP data used in our work is mainly from Reference Genome Data(HMP-RGD)of healthy people and Metagenomic Shotgun Sequence(HMP-MSS).Weintegrated HMP-RGD and HMP-MSS data into the background data to achieve the comparison between the single bacterial genome sequence and HMP-RGD/HMP-MSS data.Second,we reconstructed the mGS on the Sugon computer clusters of our laboratory.The task scheduling system was introduced to the software by task management.This method avoids the system crash resulted from the multiple users submitting tasks at the same time.The task scheduling system assigned the calculation parts highly occupying CPU and memory.In terms of computing,parallel strategy was formulated towards characteristics of using too many cores too much memory in the computational nodes.Through the evenly dividing the bacterial genome,the smaller sequence started the individual comparison process.The individual process is not related to each other,so it can be executed in parallel.In theory,the effect of speedup is the multiples of processes used.Then the monitoring script could monitor whether each process execute or not.Finally the resulting file was summarized,and it was shown on the client browser interface.This acceleration strategy has the good scalability.Adding new computational nodes can increase the corresponding parallel process to solve the computational bottleneck.In the era of big data,the computing strategy similar to the mGS can provide one of the effective solutions to meet the huge amounts of microbial sequence data analysis requirements.Third,we analyzed the whole genome sequence of a industrial strain Klebsiella pneumoniae KCTC 2242 using the mGS to identify the specific regions of the strain.We performed the mGS analysis from the following two aspects.(i)Intraspecific genetic analysis.We established the comparison library using 182 replicons(fully sequenced 47 strains Klebsiella pneumonia included in GenBank,except for a chromosome and a plasmid of Klebsiella pneumoniae KCTC 2242).Then the comparison between Klebsiella pneumoniae KCTC 2242 and the library was performed to identify the specific regions.(ii)Selecting the HMP-RGD asthe comparison library,containing the 11.9 GB data(1391 strains).Comparison between Klebsiella pneumoniae KCTC 2242 and the library was performed to identify the specific regions compared with the HMP.At last,we analyzed the accessory regions of the other 47 completely sequenced genomes available in GenBankThe results of specific gene analysis showed that most of the strain-specific regions carry foreign DNA sequences.We also identified the conserved and strain-specific regions,including drug-resistant genes,virulent genes,and their flanking IS elements.It might be helpful to investigate the important virulence and antibiotic resistance traits of this bacterial pathogen in the level of genome sequence.
Keywords/Search Tags:Klebsiella pneumoniae, Comparative analysis of bacterial genome sequences, Human Microbiome Project
PDF Full Text Request
Related items