Font Size: a A A

Data Mining For Bacterial Genomic Island-related Modules

Posted on:2015-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:L M LiuFull Text:PDF
GTID:2180330452966899Subject:Biology
Abstract/Summary:PDF Full Text Request
Bacteria possess plastic genomes, and genomic island is one of the keyfactors driving the evolution of the genetic variety. This study employedthe metholodogies of text mining and comparative genomics to investigatebacterial genomic islands and related or associated genetic modules/factors,including bacterial type IV secretion systems (T4SSs), non-coding RNA(ncRNA) genes and duplicated genes.The web-based tool mGenomeSubtractor for parallel in silico subtractivehybridization analysis of multiple bacterial genomes, could identifyputative genomic islands by locating the variable region of the genome. Toenhance its processing capability and stability, computer programs weremodified. Data management and retrieval methods were improved to avoiddata redundancy and facilitate data update and maintenance. Taskmanagement strategy was developed using a job scheduling system toorganize multiple tasks. In addition, key parameters of mpiBLAST, therate-limiting step, were optimized, which dramatically improved theefficiency. Meanwhile, the local database containing the fully sequencedbacterial genomes was updated with2078more replicons being availablein the drop-down menu for analysis. Moreover, we developed a functionalanalysis module for partially sequenced genomes. Some bacterial genomic islands use T4SSs for conjugation. We collected10752core components mapping to811T4SSs and1884T4SS effectorsfound in representatives of289bacterial species with925directly relatedreferences. We improved the T4SSs classification scheme and furtherclassified identified T4SSs. There are two putative T4SSs in Klebsiellapneumoniae HS11286chromosome and one in a plasmid with no effectordetected, and their fuctions are to be further studied.There are ncRNA genes located in the context of some genomic islands ofbacterial genomes as genomic islands prefer to insert at tRNA and tmRNAgenes. In this study, we assess three ncRNA gene-prediction tools usedfrequently for the different bacterial genomes. The sRNAPredict hadhigher specificity and positive prediction value, but lower sensitivity thanPORTRAIT. The performance of both tools varied with the selected strainsof different G+C contents. The obtained G+C content-associated matrixslightly improved the average accuracy of sRNAscanner. So conservedsequence features of ncRNA gene promoters and terminators in genomessharing similar G+C contents may be helpful to enhance bacterial ncRNAgenes prediction. Further, potential ncRNA genes in Klebsiellapneumoniae HS11286are identified and the relationship with genomicislands is explored.The dosage and function compensation among duplicated genes maycontribute to the occurance of bacterial antibiotic resistance. Though geneduplication is fully studied in eukaryotes, there is no tool available for theidentification of duplicated genes in bacterial genomes. So we developed a web-based tool called triP to rapidly identify bacterial duplicated genes.Klebsiella pneumoniae HS11286was used to assess the tool. There are11putative duplicated genes of5groups from46candidates having homologywith DEG. And2out of5groups of duplicated genes have been describedin literatures, which indicates that the prediction of duplicated genes bytriP is reliable to a certain extent.
Keywords/Search Tags:Bioinformatics prediction, text mining, genomic island, subtractive hybridization, type IV secretion systems, non-coding RNA, duplicated genes
PDF Full Text Request
Related items