Font Size: a A A

Research On The Establishment Of Lung Cancer Diagnosis Models Based On Exhaled Breath Analysis And Machine Learning

Posted on:2022-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:P Q LiaoFull Text:PDF
GTID:2504306524482134Subject:Clinical Medicine
Abstract/Summary:PDF Full Text Request
Background:Lung cancer is currently the most lethal malignancy,with a 5-year survival rate of only 17%.Patients with stage I NSCLC may have a 5-year survival rate of about70-90% in case that early detection is available.So far,most patients with lung cancer are already in the advanced stage when at diagnosis,and early detection is crucial to improve their prognosis.Traditional early detection of lung cancer is faced with radiation exposure and has the high false positive rate,and a simple,rapid,non-invasive method with high sensitivity is urgently required.Exhaled breath analysis assisted disease diagnosis has been applied in clinical practice and is one of the most potential non-invasive early detection since that the volatile organic compounds from the metabolism will be taken out of the body through circulation and air exchange in the lungs and appear in the breath,which can reflect the disease state of the human body.However,such analysis in lung cancer is still in its infancy,and there is currently no large-scale diagnosis and verification study.Objective:(1)To analyze the composition of the exhaled breath of those included subjects and collect their clinical features,and establish a exhaled breath-clinical features database having large samples of patients with lung cancer,benign lung disease and healthy people;To construct a lung cancer diagnostic models and screen potential lung cancer-specific markers by comparing the differences in VOCs in exhaled breath among patients with lung cancer and healthy people as well as among those with lung cancer and those with benign lung diseases,so as to provide a basis for the exhaled breath analysis for lung cancer diagnosis in clinic practice.Methods:Solid-phase microextraction(SPME)combined with gas chromatography-mass spectrometry(GC-MS)was used to analyze the composition of the exhaled breath of1191 subjects suspected of being diagnosed with lung cancer(lung cancer + benign lung disease)and 804 healthy people.Mass spectrometric data were analyzed for VOCs qualitative analysis.Clinical features of the subjects were collected and uploaded to the Res Man clinical public management platform to establish a management platform having large samples of breath gas-clinical features.Genetic Algorithm(GA)combined with SVM in machine learning was then used to establish lung cancer screening model(patients with lung cancer versus healthy people)and the probability model of benign and malignant pulmonary occupying lesions(lung cancer versus benign lung disease)according to different application scenarios.In order to further optimize the predictive ability of the probability model of benign and malignant pulmonary occupying lesions,an integrated model was constructed in combination with clinical feature data.ROC curves of the models above were plotted for the performance of models.Results:A total of 629 patients with lung cancer,606 healthy subjects,and 139 patients with benign lung disease were finally analyzed for the composition of the exhaled breath.64 VOCs were identified from the mass spectrum data.A database of exhaled breath-clinical features having large samples of 1374 subjects was established,including clinical features of clinical baseline,such as age,gender,etc.,as well as imaging features and pathological features.Patients with lung cancer and healthy people were included in the database.A lung cancer screening model was established based on the GA-SVM algorithm,including 8 lung cancer-specific markers of tetrachloroethylene,nonanal,C10H16,toluene,naphthalene,a-pinene,dimethyl succinate and N,N-dimethylacetamide.The model had an AUC of 0.98,a sensitivity of96.8%,a specificity of 93.4%,and an overall accuracy of 94.8%.More patients with lung cancer and benign lung disease were included in the database.The same method was used to establish a probability model of benign and malignant pulmonary occupying lesions,including 30 VOCs such as cyclohexane,butyric acid,and acetone.The model had an AUC of 0.65,an accuracy rate of only 54.15%,and sensitivity and specificity of 51.06% and 68.29%,respectively.The predictive ability was improved after combining the clinical features variables selected to establish an integrated model,with AUC reaching 0.776,the overall accuracy of the model of 75.56%,and the sensitivity and specificity of 78.7% and 68.3%,respectively.Conclusion:(1)64 VOCs have been identified through SPME-GCMS analysis of the composition of the exhaled breath.No significant difference has been found in the types of VOCs in the exhaled breath of patients with lung cancer,patients with benign lung disease and healthy subjects,and only the difference in the content.(2)Exhaled breath data of patients with lung cancer and healthy people have been analyzed based on GA-SVM in machine learning algorithm.The lung cancer screening model established has good predictive ability and is expected to be for early screening of lung cancer.The selected VOCs can be used as potential lung cancer-specific markers.(3)The probability model of benign and malignant pulmonary occupying lesions is not good in predicting based on GA-SVM analysis of the exhaled breath data of patients with lung cancer and patients with benign lung diseases,but it has been significantly improved after combining with clinical features.
Keywords/Search Tags:Exhaled breath analysis, Volatile organic compounds, Lung cancer, Probability model, Machine learning algorithm
PDF Full Text Request
Related items