| Variation at genomic level in human, such as genetic mutations and single nucleotide polymorphisms (SNPs) have proved strong correlations to phenotypic variation including diseases. High-throughput technologies’ application to human genome have already identified millions of genetic variations, those variations in coding regions will potentially affect the corresponding amino-acids, resulting in the variation at amino acid level called single amino-acid polymorphisms (SAPs). Although studies have been conducted to globally identify SAPs, only a small number have been detected. This is due to the low coverage of mass spectrometry experiments and the inadequately protein variation database.In this article, we first built a synthetical human variation database, the mutation data was collected from eight related databases, including NCBI dbSNP database〠Ensembl variation databaseã€Catalogue Of Somatic Mutations In Cancerã€protein mutant database (PMD)ã€human protein mutant database (HPMD)ã€UniProt variation databaseã€MSIPI database and MS-CanProVar database. We then put forward a workflow to identify mutated peptides and associated proteins based on the large amount of proteomic mass spectrometry data (11,113 experiments). After strict quality control, we detected 54,244 variant peptide (including 29,431 nsSNPs peptide and 24,813 mutant peptide), mapped to 4524 genes, which improved the mass spectra identification rate. We also constructed a dynamic web site to store all the variations and their related information. Spectrums were also displayed to let readers intuitively understand the mutation sites, mass shifts and other information.Further analysis showed great variation gap between tissues but some common variation found in all these tissues. We also found that aromatic amino acids with stable stuctrue were not easy to mutant. After doing functional analysis for cancer specific variation (GO and KEGG Pathway) with DAVID, we discovered the mutated protein enriched in some important pathways, indicating that the protein variation have strong correlations to phenotypic variation including diseases. I believe that the protein variation database we constructed will provide great chance for others to do further analysis. |