Font Size: a A A

Prediction Of Depression Classification And Biomarker Discovery Based On Small Sample Plasma Mass Spectrometry Data

Posted on:2024-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q TuFull Text:PDF
GTID:2544307076492844Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Depression is a common mental disorder,and according to clinical research statistics,about 27%of people will experience depression or symptoms similar to depression at some point in their lives.Therefore,early diagnosis and treatment of depression are of great significance.Traditional diagnosis of depression is usually based on the patient’s self-reported symptoms and scale examination results,and this method relies on the clinician’s clinical experience and medical knowledge.As a result,it has certain subjective limitations and is prone to misdiagnosis or missed diagnosis.Therefore,there is an urgent need for an objective diagnostic method in clinical practice to improve the diagnostic rate of depression.Proteomics technology studies changes in protein expression levels at the proteome level.The analysis of protein expression in biological samples by proteomics technology can effectively help us improve our understanding of the pathophysiological mechanisms of disease and develop clinical diagnostic tools for disease.Previous studies have shown differential expression of plasma proteins in patients with depression.Therefore,this study is based on plasma proteomic data to predict depression classification and discover biomarkers.Proteomics data can be obtained through the identification of mass spectrometry data collected using liquid chromatography-tandem mass spectrometry.Such data typically has the characteristics of "many P’s and few N’s," that is,there are many features but few samples.When using machine learning to process such data,the results are often unsatisfactory.Therefore,this study proposes a depression classification prediction model for small-sample plasma proteomics data from the perspective of small-sample learning.Compared with traditional machine learning models,the classification accuracy of this model is significantly improved.Biomarkers refer to objective measures of biological states.The discovery of depression-related biomarkers can help us better understand the pathogenesis of depression and provide a basis for the development of more convenient and effective diagnostic methods for depression.This article proposes an interpretable analysis method for discovering protein biomarkers for depression.(1)Plasma mass spectrometry data pre-processing and data analysis.Protein identification was performed on the plasma mass spectrometry data of the subjects,and proteomics data for each sample was obtained,including the protein quantification information of each sample.Due to issues such as protein contamination,noise,and missing values in the data,we conducted further preprocessing operations to obtain the final experimental dataset.Based on this dataset,we analyzed the similarities and differences between the plasma proteomics data of patients with depression and healthy individuals.(2)A depression classification prediction model based on small-sample plasma proteomics data was constructed.In view of the problem that depression proteomics data generally has a high dimensionality of features and a small number of samples,this paper proposes a depression classification prediction model based on small-sample plasma proteomics data.The model is based on Hierarchical Graph Convolutional Network(HIGCN)and Jumping Knowledge Networks(JKNet).A causal sample weight module is introduced to minimize the impact of feature correlation and improve the model’s generalization ability.Drop Edge idea is introduced to alleviate the overfitting problem of deep learning models on small-sample datasets during training,which improves the model’s prediction accuracy.Finally,the effectiveness of the method was verified through experiments on a public dataset.(3)A biomarker discovery method for depression based on model-interpretable analysis is proposed.This study proposes a model interpretability-based method for discovering plasma protein biomarkers for depression.First,protein differentially expressed between depression patients and healthy individuals were screened using statistical analysis methods.Then,these differentially expressed proteins were used as features to train a classification model.Next,model-independent interpretability methods LIME and SHAP were used to explain the model’s prediction results.Based on the global feature importance ranking of protein features for the model,the final depression protein biomarkers were determined.The study also conducted statistical and biological analyses of the discovered protein biomarkers to validate their effectiveness and reliability.Finally,the discovered protein biomarkers were used as features to train a classification model to test their classification ability.
Keywords/Search Tags:depression, biomarker, causal sample weight, small sample learning
PDF Full Text Request
Related items