| In recent years,with the increasing demand for chemical and biological sample analysis,various analytical techniques have been extensively developed,among which mass spectrometry is one of the most widely used chemical techniques in analytical chemistry.Face with the demand for mass spectrometry data recognition technology,the retrieval research of rapid and accurate identification of samples has also made continuous progress,various of similarity measures and probability methods have come along successively,and the recognition accuracy has gradually improved.Currently,these similarity measure methods are adopted in most commercial software.However,there are still many mass spectrometry molecules that are difficult to distinguish.In order to solve this problem,the mass spectrometry data identification technology is studied in this paper.Traditional mass spectrometry identification technology is realized based on some methods of similarity measure and probability,but there are isomers in the mass spectrometry library,which leads to high false positive identifications.With the rapid development of deep learning in various fields,deep learning is also widely used in the field of chemical information.In this paper,a deep classification model is proposed to assist chemical mass spectrometry data retrieval,and pairwise labels algorithm is developed for chemical mass spectrometry data retrieval.Chemical mass spectrometry data retrieval based on deep learning has greatly improved the accuracy compared with the traditional mass spectrometry identification methods,and verified on NIST05 data set.(1)In order to solve the problem that the traditional similarity measurement method is difficult to identify correctly because of the existence of many isomers in the mass spectrometry library,this paper uses a method of rough screening and accurate classification.The first part,rough screening: the similarity between similar mass spectra is increased by weighted dot product method,and then the calculation cost is reduced by random projection method,and finally the Top ten similar mass spectra in the reference database is calculated by the weighted cosine similarity;the second part,accurate classification: selecting the Top N mass spectra molecules from the Top ten mass spectra to be pairwise matched with the corresponding mass spectra molecules in the query library,and then training the two-classification model through the designed network model,and finally,the trained two-classification model is applied to the Top ten mass spectra for subdivision and reordering.(2)For the first method of side assisted chemical mass spectrometry data retrieval,there are also some problems that the process is complex,the cost is huge,and the convergence is slow.In this paper,a pairwise labels algorithm for chemical mass spectrometry data retrieval is used to solve the problems of the first method.First,the similarity between mass spectra is increased by the weighted dot product method.In order to reduce the calculation burden,batch construction is used to construct paired data sets for model training to extract mass spectrum features.The objective function adopted can increase the similarity between the same chemical mass spectrometry molecules characteristics,and reduce the similarity between different chemical mass spectrometry molecules characteristics at the same time.In this paper,experiments are carried out on NIST05 data set,and the experimental results show that the chemical mass spectrometry recognition based on deep learning proposed can be better applied to mass spectrometry database retrieval,which verifies the effectiveness of the method. |