Font Size: a A A

Combined Multidimensional Liquid Chromatography Tandem Mass Spectrometry-Based Application In Shigella Proteogenomics

Posted on:2011-09-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L N ZhaoFull Text:PDF
GTID:1220330374973803Subject:Microbiology
Abstract/Summary:PDF Full Text Request
With novel sequencing strategies appearing constantly, genome sequencing has been becoming a routine work. Nonetheless, accurate annotation of these genomes is the bottle-neck of knowledge acquisition. Though the reliability of prediction for protein-coding genes by computational algorithm has been growing, some of annotations are inevitably erroneous due to its inherent characteristics. In the past several years, proteogenomics has emerged as a powerful tool to improve genome annotations and find novel genes. Peptides identified by MS/MS could be mapped back to their original genomic locations, resulting in integration of proteomics data with the genome annotation. At present, it has been yet a challenging work for development of rapid, high-throughput proteogenomic approach. Research showed that proteome coverage would be greatly improved under the combination of identification by two-dimensional liquid chromatography matrix-assisted laser desorption/ionization time of flight tandem mass spectrometry (2-D LC-MALDI-TOF/TOF) with identification by two-dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC-ESI-MS/MS). With the completment of genome sequence analysis on corresponding representative strains of all four Shigella spp. serogroups, it should be taken for granted that Shigella as an ideal model for proteogenomics research.Here, with the advantage of combined analysis by2-D LC-MALDI-TOF/TOF and2-D LC-ESI-MS/MS, we proposed a new design to improve previous genome annotation of S.flexneri2a str.301.First, Using different prefractionation protocols, bacterial cytosolic and membrane proteins were prefractionated and enriched respectively. Proteins mixtures were digested with trypsin, and resulting peptides were separated using2-D LC, and subsequently analyzed using off-line MALDI TOF/TOF and on-line ESI-MS/MS. MS/MS spectra were searched respectively with MASCOT and SEQUEST against all possible six-frame translation database generated from S.flexneri.A total of1231proteins were identified. Their distribution patterns of pI、MW and GRAVY were similar to those of4443S. flexneri2a str.301annotated proteins. Furthermore, identified proteins were involved in20groups of22functional groups in clusters of orthologous groups of proteins (COGs), which suggested our result embodied the protein composition of biological samples. Meanwhile,306hypothetical proteins were validated (16%of1944in silico). Through N-terminus extension database and RT-RCR, wrong translation start sites of three ORFs were amended. Two ORFs, which could not translated successfully because they had stop-codon’mutation’due to sequencing errors, were revised:pseudogene zwf annotated as gene encoding6phosphate dehydrogenase and fusA extended for more240bp. Above all,34brand-new ORFs were discovered, including5genes annotated in other enterobacterial and29novel genes. Furthermore,9out of34novel genes were confirmed by RT-PCR or Northern blot experiment. It would be well worth for further study on their unknown biological functions.Additionally, various properties of peptides identified, including K/R ratio, physicochemical properties and length and so on, were interpreted and compared between the two different mass analyzers. Our findings suggested that MALDI-TOF/TOF preferred to ionizing short、basic peptides and peptides with an arginine at the C-terminus, whereas ESI-MS/MS preferred to ionizing long、hydrophobic peptides and peptides with an lysine at the C-terminus. In summary, the optimized combined system would not only greatly improve the quality but also increase the numbers of identified proteins.As described above, combined system,2-D LC-MALDI-TOF/TOF together with2-D LC-ESI-MS/MS, was utilized to improve the genome annotation for the first time. Owing to the complementary characteristics of ESI and MALDI, combined system was superior to any single tandem mass spectrometry in terms of quality and quantity. Combined proteomic results together with bioinformatics analysis were quite qualified for comprehensive and accurate genome-wide annotation, such as validation of annotated genes, confirmation of hypothetical genes, revision of wrong translation start sites, correction of pseudogene and especially discovery of novel genes. This strategy could be taken as routine work in other organisms’ genome annotation process.
Keywords/Search Tags:genome annotation, S. flexneri, proteogenomics, 2-D LC-MALDI-TOF/TOF, 2-D LC-ESI-MS/MS
PDF Full Text Request
Related items