| Protein quantification technology based on mass spectrometry is currently an important method for studying cancer,but the analysis of protein mass spectrometry data,especially the lack of identification capabilities,has limited its development.At present,there is a lack of effective analysis methods for protein mass spectrometry data.Although there are many existing identification tools,the identification rate of the spectrum is not very high,and effective analysis of protein mass spectrometry data requires a lot of exploration.If deep learning methods can be used to classify and identify protein mass spectrometry data and optimize it for processes,this will provide a reference for future researchers to analyze protein mass spectrometry data.Cancer research often pays attention to the differences between disease samples and healthy samples.If the feature visualization method can be used to find important information that cannot be identified in mass spectrometry data,it will have great significance for the analysis of protein mass spectrometry data.Under the above background,this study introduces deep learning methods into the classification and identification process of tumor tandem protein mass spectrometry data,in order to improve the classification performance of protein mass spectrometry data and find important differences.In this paper,we analyze three public tumor datasets in the i Pro X database and PRIDE database,and propose an effective data preprocessing and feature extraction method for the characteristics of protein mass spectrometry data with high dimensions and high noise.Firstly,we perform preliminary filtering of high-dimensional mass spectrometry data by mass-to-charge ratios to remove some of the features with severe noise.Then,we use the support vector machine to filter out the effective mass-to-charge ratio and construct the training data based on its corresponding intensity value and retention time.Finally,we tried to classify the processed data in combination with deep learning methods,and compared with a variety of traditional machine learning methods.The experimental results show that one-dimensional convolutional neural network achieves the best classification performance by performing a ten-fold cross-validation on the liver cancer data set(HCC)and using the gastric cancer data set(DGC)as an independent verification set.In order to further verify the validity of the results,we also compared with traditional identification and quantitative methods,andthe results showed that the method used in this paper achieved the best classification performance.At the same time,we also analyzes the classification results through feature visualization methods such as SHAP and Grad-CAM to try to find important differences between samples.The continuous development of protein mass spectrometry technology and the accumulation of mass spectrometry data not only pose new challenges to our data processing methods,but also provide opportunities for us to continuously explore new analytical methods.This study has shown the great potential of deep learning methods for protein mass spectrometry data analysis.It is believed that as the research continues,more effective analytical methods will be applied to the study of protein mass spectrometry. |