Font Size: a A A

Research On Specific Sepsis Support Vector Machine Diagnostic Model Based On Multi-omics Data

Posted on:2024-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:S MaFull Text:PDF
GTID:2544307160491404Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Sepsis is a life-threatening organ dysfunction caused by a dysregulated host response to infection.Sepsis is a significant global public health problem as one in five deaths worldwide is related to sepsis.In critically ill patients in intensive care unit(ICU),the incidence of sepsis is 20.6%(95%Confidence Interval[CI],15.8%-25.4%),with a 90-day mortality rate of 35.5%.Therefore,sepsis causes considerable medical resource loss each year.While any infection can lead to sepsis,the most common pathogens are bacteria,including Staphylococcus aureus bacteria(SaB),Streptococcus pneumoniae(S.pneumoniae),and Escherichia coli(E.coli).Studies have shown that mortality rates from sepsis caused by different pathogens vary since sepsis has strong heterogeneity.The clinical practice finds it difficult to identify the non-specific and complex triad of sepsis.Traditional laboratory testing methods such as blood culture identification,immunological testing,nucleic acid probe hybridization,and polymerase chain reaction amplification are slow and challenging in simultaneously detecting multiple pathogens.Sepsis is a complex and time-dependent disease with a high mortality rate,which can even progress from septic shock to ICU within 24 hours,delayed diagnosis and treatment of septic shock during the initial 6 hours in the ICU are strongly associated with an increased mortality rate.Currently,no specific or sensitive indicator can be routinely used in clinical practice and traditional methods of detecting a single biomarker cannot efficiently identify patients at risk of sepsis.Therefore,with the expanding application fields and increasingly mature multi-omics analysis technology,the cellular and molecular levels’ understanding of the occurrence and development process of sepsis has improved,and early sepsis patients can be identified more efficiently.METHODSThis study used data from two sources.The first source was from the Community Acquired Pneumonia and Sepsis Outcome Diagnosis(CAPSOD)study(ClinicalTrials.gov NCT00258869),and the second source was a study of immune response to Staphylococcus aureus at the University of Wisconsin-Madison Health Center,with the obtained Institutional Review Board(IRB)approval number#20180098.Serum testing data within 24 hours before medication were extracted from 197 patients with Staphylococcus aureus infection sepsis,84 patients with nonStaphylococcus aureus infection sepsis,and 48 control patients without sepsis.After comparing the effectiveness of nine feature selection methods and considering the model’s fitness,the Support Vector Machine Forward Selection feature selection method based on the support vector machine was chosen in the single-omics feature selection.For multi-omics feature selection,the Partial Least Squares Discriminant Analysis method was used for screening,and a post-fusion strategy integrated the two omics data.The support vector machine(SVM)algorithm with excellent classification and prediction ability was used as the model foundation to construct the metabolomics SVM diagnostic model,proteomics SVM diagnostic model,and metabolomicsproteomics SVM diagnostic model.The algorithm model was evaluated based on five indicators:accuracy,precision,recall rate,F1 score,and area under the receiver operating characteristic curve(AUROC).The z-test was used to determine whether there is a difference in the AUROC of different models,and p<0.05 indicates that the difference is statistically significant.Finally,the Shapley Value evaluated the contribution of features in the model to the classification results.Results:1.Principal Component AnalysisUsing principal component analysis to reduce dimensions can display differences in the structure of metabolite and protein composition between samples.It was found that the metabolite features could not effectively separate the samples,which suggests that the selected features might not be robust enough or the differences between samples were not captured by these features.On the other hand,PCA effectively distinguished between the control and sepsis groups based on the protein features,implying that these features were effective in discriminating between the two groups.However,the classification performance for the non-SaB sepsis group was generally poor,which might be attributed to the significant overlap in metabolite and protein composition structure between this group and the other two groups,making it challenging to distinguish them effectively.2.Result of the Feature Selection MethodThis study compared the efficacy of eight feature selection methods for analyzing single-omics data and determined that the Support Vector Machine Forward Selection method was the most effective.For feature selection in multi-omics data,Partial Least Squares Discriminant Analysis method was employed to select metabolite and protein features.The top five important features were identified using each method.3.SVM model for early diagnosis of sepsisIn this study,we constructed three early diagnosis models using the SVM algorithm:one based on non-targeted metabolomics features(AUROC=0.50±0.004,ACC=0.63,Pre=0.39,Recall=0.63,F1-score=0.48),one based on proteome features(AUROC=0.89±0.006,ACC=0.87,Pre=0.91,Recall=0.87,F1-score=0.87),and a multi-omics SVM model based on metabolomics-proteomics features(AUROC=0.95 ± 0.0002,ACC=0.86,Pre=0.88,Recall=0.86,F1-score=0.86).Statistical analysis showed that the p values of AUROC among the three models were all<0.001.The results based on the multi-omics model were relatively good and had a relative advantage.Using the Shapley value,we identified the three features that contribute the most to the classification results of multi-omics SVM models AHSG,Phenylacetylglutamine,and PRG4.4.Bioinformatics AnalysisGene Ontology(GO)term enrichment analysis and metabolic pathway enrichment analysis(Kyoto Encyclopedia of Genes and Genomes,KEGG)were performed for the genes corresponding to the proteins selected by the multi-omics model.Enrichment analysis at the biological process level revealed that these genes were primarily involved in processes such as peptide regulation,acute inflammatory response,and insulin response.At the cellular component level,the gene enrichment was mainly in substances such as collagen extracellular matrix,endoplasmic reticulum lumen,platelet alpha-granule lumen,and blood microparticles.The metabolic pathway enrichment analysis identified five protein-coding genes that were associated with type 2 diabetes,adipocytokine signaling pathway,peroxisome proliferator-activated receptors signaling pathway,and bacterial invasion of epithelial cells pathway.The five metabolites identified in this study were mainly enriched in the synthesis of(taurine)and bile acids and were associated with diseases such as alpha-1-antitrypsin deficiency,Wilson’s disease,carnitine transporter defect,and systemic carnitine deficiency.Our findings also revealed that AHSG,FN1,and PRG4 are closely related and are co-expressed in the protein interaction network,while ADIPOQ and FN1 play important roles in adipocytokine signaling pathway and bacterial invasion of epithelial cells pathway,respectively.Conclusion:In this study,the multi-omics SVM model was found to outperform the singleomics-based models in accurately predicting patient status.Bioinformatics analyses,including GO enrichment analysis,KEGG enrichment analysis,and protein interaction network analysis,revealed that the expression level of bile acids,the expression level of intestinal microbial genes(porA,fldH),PAGln,serum copper levels and upregulated PRG4 may serve as early biomarkers of sepsis.In addition,a reduction of multiple acylcarnitines and the significant reduction of free carnitine(<10 μmol/L)should be used as a warning signal for suspected sepsis,and SLC22A5 gene mutation can be used as an auxiliary diagnostic test.Protein function analysis found that FN1 may serve as an immune-related checkpoint in sepsis.The five genes corresponding to the encoded proteins in the multi-omics model are related to inflammatory diseases or antiinflammatory and antibacterial pathways.Bioinformatics analysis supports the reliability of the multi-omics model,through further validation in rigorous animal experiments and larger independent datasets are needed to explore its clinical value.
Keywords/Search Tags:Sepsis, multi-omics, support vector machine(SVM), susceptibility, early diagnosis
PDF Full Text Request
Related items