Font Size: a A A

Research On Identification Method And Algorithms Of Intact N-Glycopeptides Based On Mass Spectrometry Data

Posted on:2020-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2370330602950692Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Glycosylation is one of the most prevalent and very important post-translational modifications of proteins in the biological world.Research shows that more than 50% of proteins in eukaryotes contain potential glycosylation sites.With the development of mass spectrometry technology,research hotspots have evolved from glycosylation structure identification and glycosylation site identification of proteomics to site-specific glycan identification of glycoproteomics,ie the scale of intact glycopeptides.The modification of protein glycosylation has important biological functions in protein folding,immunity,and signal transduction.In practical applications,biomarkers and protein drugs for the diagnosis of various diseases are glycoproteins,and a large nu mber of glycoproteins with important and special functions need to be further analyzed for their correlation between glycosylation modification and disease.Therefore,there is an urgent need to research and develop a set of algorithms and processes that can scale the structure of glycoproteins on a large scale.However,due to the lack of the glycan structure database and the complexity of the structure of the glycan itself,the N-glycopeptide structure analysis has encountered great challenges and the existing methods are not very satisfactory in solving such problems.Based on the relationship between the characteristics of glycan structure fragments in mass spectrometry data,we constructed a B-ion structure database using improved multi-minimum support association rules mining,and applied the constructed database to the complete N-glycopeptide identification process.The following innovative results have been achieved:1.Mining the relationship between the structure of glycan fragments and construct a library of B ion structures.For the problem of the importance and probability of different data items in a mass spec data transaction set,this study combines vertical data format and pruning strategy to propose a new multi-minimum support association rule algorithm for mining the association relationship of B-ion structure.The improved method can avoid too many unrealistic association rules in the process of mass spectrometry data mining,and can effectively mine the association rules with fewer occurrences,which improves the speed and accuracy of association rule mining.The analysis of the experimental results also further demonstrates the effectiveness of the improved method.2.Based on the constructed B ion structure library,according to the HCD step energy 20-30-40%,the characteristics of rich glycan and peptide fragment ions can be obtained.This study designed a complete N-glycopeptide identification process based on single spectrum.First,the mass spectrometry data was used as input,and the mass spectrometry data pretreatment,peptide sequence identification,sugar chain structure identification,glycopeptide matching scoring,error rate estimation,and finally the N-glycopeptide identification results were output.3.Based on the mass spectrometry data of five tissues of heart,liver,brain,lung and kidney of mouse,the method proposed in this study was compared with other existing methods,and the experimental results were explained in biological sense.The experimental results verify the accuracy and effectiveness of the proposed algorithm and process in the identification of intact N-glycopeptides.In summary,the nonlinear glycan structure and linear peptide sequence in the mass spectrometry data have different physical and chemical properties.In this study,the characteristics of peptide fragments and the characteristics of glycan fragments were analyzed,and corresponding methods were proposed to improve the identification of mass spectrometry data.The specific structure of the N-glycopeptide was obtained through the process,which provided a theoretical basis and method for the prediction,diagnosis and treatment of the disease.
Keywords/Search Tags:Tandem Mass Spectrometry, N-glycosylation, B-ion Structure Database, Multiple Minimum Supports Association Rules
PDF Full Text Request
Related items