Font Size: a A A

Study On Protein Spectral Library Searching Strategy Based On Deep Learning

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:C Y QinFull Text:PDF
GTID:2370330614958612Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The spectral library searching tools are frequently used in mass spectrometry-based proteomics analysises.As the core function,spectral similarity calculation directly determines the performance of the tools.Spectral similarity calculation usually consists of two parts: spectral feature extraction and scoring function,and spectral feature extraction plays an important role in the overall performance.However,in the existing spectral library searching tools,the design of spectral feature extraction mainly depends on the prior knowledge of researchers,and their feature collections only contains a few parameters,which can not effectively use a large number of hidden features.At the same time,the computing performance of the searching tools directly limits the ability of knowledge mining from the big spectra data.In particular,existing tools need to extract spectral features in every run,and the same calculation process may be repeated many times,which greatly wastes computing resources.In view of the shortcomings of these spectral library searching tools and their feature extraction functions,this paper has completed the following researches:1.This study trains a spectral feature extraction model based on deep learning,which is called "DLEAMSE"(Deep LEArning MS/MS Spectra Embedder).DLEAMSE's training structure is based on Siamese-Network,and its training and testing data set are built on high quality spectra clusters from the PRIDE Cluster,which covers a wide range of instruments and species.The results show that DLEAMSE performs well in testing data set(Area Under the Receiver Operating Characteristic(ROC)Curve--AUC=96.2%).This study searches spectra set from PRIDE Cluster's human spectral library against itself by Faiss.The results show that 95.34% of spectra have correct Spectrum-Spectrum Matches,which proves that the spectra generated by different peptides are apart enough and distinguishable by threshold.2.For comparing the tranditional spectral similarity scoring methods and evaluating DLEAMSE based similarity scoring,performances of deep learning-based methods(DLEAMSE and previously proposed model GLEAMS)are compared with that of five traditional methods in Arabidopsis,Mouse and Yeast data sets.Results show that DLEAMSE based method's performance follows the normalized dot product and Pearson correlation coefficient,and is better than GLEAMS.In computing performance perspective,the deep learning-based methods have more advantages in processing on large scale data.3.For building a faster spectral library search tools on big spectral data,this study proposes a spectral library searching method based on DLEAMSE.Its implementation is based on Faiss,with two sub-processes: spectral library index building and searching against the Faiss index.This study compares it with Spectra ST on NIST's Human spectral library and MHCCLM3 cell line spectra data,the results show that DLEAMSE based method has good spectra identification performance,and has advantage in computing performance: the searching's run-times are less than 1/10000 of Spectra ST's,which proves that this method is suitable for large scale data processing.In short,DLEAMSE model and the new spectral library searching method propose in this paper have good innovation and make positive contributions to big data analysis of proteomics.
Keywords/Search Tags:proteomics, MS/MS spectra, spectral feature extraction, deep learning, spectral library searching
PDF Full Text Request
Related items