| Mass spectrometry(MS)is an extensively applied analytical method in various fields.It ionizes samples to generate ions,which are then separated using electric or magnetic fields to produce mass spectra data.MS has been widely employed in proteomics and metabolomics for disease diagnosis,prognosis,as well as identification and quantification of metabolites in biological samples.However,the analysis of mass spectrometry data is a complex and data-intensive process.Computational methods based on machine learning and deep learning can handle and interpret these large volumes of data,providing professionals with faster and more accurate results than traditional manual approaches.Preprocessing of mass spectrometry data plays a crucial role in the analysis and interpretation process.It reduces noise in the raw data,ensures data quality,and improves feature engineering by removing irrelevant features.This thesis proposes an ms DAWG preprocessing algorithm for mass spectrometry data,based on a combination of dual autoencoder(DAE)and Wasserstein generative adversarial network(WGAN).Its aim is to eliminate data noise and enrich information relevant to classification targets.The algorithm consists of five modules: preprocessing,feature screening,feature construction,feature selection,and classification.These modules collaborate to preprocess the raw mass spectrometry data for subsequent classification.The feature construction module includes three key structures: the dual autoencoder,the generative adversarial network,and the self-attention mechanism,which work together to achieve efficient feature engineering.The experiments are conducted on seven mass spectrometry datasets from three different spectrometers and two mammalian species.The experimental results show that the features extracted by ms DAWG achieve classification AUC values exceeding 0.99 on six datasets and over 0.89 on the remaining challenging dataset,outperforming five compared methods.Furthermore,the ms DAWG algorithm enriches the useful information in mass spectrometry data,allowing the classification model to achieve satisfactory performance with a small number of features.This efficient feature engineering algorithm demonstrates good generality and scalability,enabling researchers to accurately analyze and process mass spectrometry data,promoting progress in mass spectrometry data classification and application research,and positively impacting the field of mass spectrometry data analysis.The preprocessing algorithm ms DAWG has demonstrated the effectiveness of structures based on dual autoencoder(DAE)and Wasserstein generative adversarial network(WGAN)on human cancer mass spectrometry data.Building upon this,to further explore the scalability of this structure on mass spectrometry data of different disease types,this thesis proposes a generative adversarial autoencoder integrated with a voting-based algorithm called ms DAGVote,specifically designed for disease mass spectrometry data predictive diagnosis.msDAGVote utilizes the combination of a dual autoencoder and a generative adversarial network to construct feature vectors for disease mass spectrometry data.It then employs an integrated voting-based feature selection algorithm to filter out the constructed features that are highly relevant to disease classification.Finally,the selected constructed features are used for classifying disease mass spectrometry data,enabling predictive diagnosis of diseases.The experiments are conducted on mass spectrometry datasets of ten different disease types,and the average classification AUC for all datasets reaches 0.9503.The experimental results demonstrate that ms DAGVote can achieve accurate disease diagnosis and exhibits good scalability across various disease types and data scales.The introduction of this algorithm provides outstanding performance for disease diagnosis based on mass spectrometry data,improving the accuracy of clinical diagnosis and providing better assurance for people’s health status. |