| Tuberculosis is caused by Mycobacterium tuberculosis(MTB)infection,and it was one of the most lethal diseases so far.According to global tuberculosis report of World Health Organization(WHO),about 1.7 billion people were infected by MTB worldwide in 2019,and there were about 10 million new tuberculosis patients,resulting in 1.41 million deaths.Although M.tuberculosis H37 Rv,the type strain of M.tuberculosis,was sequenced and annotated as early as 1998.Comparing the gene annotations of the two major research institutions of Sanger Institute and Institute of Genomic Research,12% of the ORF number and46% of the translation start sites(TSS)were different,which suggested that the traditional genome annotation tool was difficult to achieve accurate genome annotation of M.tuberculosis H37 Rv.The verification of annotated genes,the correction of wrong annotations and the identification of missing coding genes in H37 Rv could enhance our understanding of M.tuberculosis and provide new data support and new gene candidates for the prevention and treatment of tuberculosis.Moreover,the re-annotation of the genome and the identification of missing annotated genes can provide reference and experience for genome annotation research of other species.Therefore,this study had great research value and application value.Compared with the whole proteome,the proportion of N-terminal peptide was much lower,and some N-terminal peptide,which were digested by protease,were difficult to be detected because of the small molecular weight,low ionization efficiency and other physical and chemical properties.In addition,many N-terminal peptides had a variety of post-translational modifications,which was not conducive to chemical labeling and specific enrichment.Therefore,this study established a high-efficiency strategy of N-terminal dimethylation combimed with negative enrichment,and optimized the experimental procedures:(1)increasing the amount of experimental proteins to improve the abundance of N-terminal peptides and improving the identification of N-terminal peptide by simplifying the sample;(2)using trypsin with independent intellectual property rights in our lab to improve the digestion efficiency;(3)using a more suitable pre-fractionation method and optimize the elution gradient;(4)optimizing the parameters of LC-MS detection and database searching to improve the identification.Through the efficient and stable N-terminal negative enrichment technology,2,728 proteins and 1,641 N-terminal peptides were identified with the annotated database.629 natural N-terminal peptides with N-terminal modification were used to verify the N-terminal annotation of 517 genes in the Tubercu List database.It was found that the cyclization of glutamine and the combined action of glutamine cyclase and signal peptidase I were related to the virulence factor secretion of M.tuberculosis H37 Rv,which was helpful to understand the pathogenic mechanism and virulence of H37 Rv.The proteins with initial amino acids intact mainly clustered in lipid homeostasis,fatty acid β oxidation and strain growth.It may be related to the dormancy or latent state of M.tuberculosis H37 Rv when facing the harsh environment with insufficient oxygen and nutrition.In addition,we also identified 3,118 ORFs and 16,824 peptides,including 1,267 Nterminal peptides.After strict data filtering,45 un-annotated peptides were identified.Based on the 45 peptides,12 translation initiation sites of annotated proteins were corrected and 6 novel genes were found.It greatly improved the existing genome annotation of H37 Rv,and 3 of the novel genes found in this study were species-specific,which had the potential to be developed as tuberculosis specific antigen. |