Font Size: a A A

Machine Learning Aided Several Minimally Invasive Methods For Differential Diagnosis Of Indeterminate Pulmonary Nodules

Posted on:2024-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H C LuoFull Text:PDF
GTID:1524307079452364Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Lung cancer has the highest incidence and mortality rate among all types of cancer worldwide.In China,the 5-year survival rate of lung cancer is only 16%.Early diagnosis can significantly improve the outcome of lung cancer patients.Low-Dose Computed Tomography(LDCT)is internationally recognized as an effective method for the early diagnosis of lung cancer.Studies conducted in the US and Europe have demonstrated that Lung Cancer Screening with LDCT has effectively shown a reduction in lung cancer mortality of 20%-25%.With the improvement of health awareness,the number of LDCT examinations is increasing,resulting in an increasing detection rate of pulmonary nodules.Pulmonary nodules do not always indicate lung cancer,over 96% of positive screening results are false-positives,and 72% of those screened require follow-up,among which18.5% are Indeterminate Pulmonary Nodules(IPNs)and 12-50% of IPNs removed by surgery are benign.Due to the high false-positive rate,it causes over-diagnosis,overtreatment,waste of medical resources and increased psychological anxiety in the examined person,so the differential diagnosis of IPNs is an important clinical and social problem.In addition to invasive examinations,traditional pulmonary nodule diagnosis techniques mainly involve LDCT imaging features,assessment of high-risk clinical factors for lung cancer,functional imaging et al.In recent years,there has been an increasing number of reports on novel pulmonary nodule diagnostic markers,mainly by detecting differences in tumor-associated nucleic acids,proteins,cells,and other molecules from airway epithelial cells,exhaled breath,lavage fluid of alveoli,sputum,blood,saliva,and urine to achieve diagnostic purposes.As a branch of Artificial Intelligence,Machine Learning has been increasingly used to assist traditional pulmonary nodule diagnosis techniques and novel pulmonary nodule diagnosis techniques to improve diagnostic performance,yet it has not been able to meet the clinical requirement of IPNs discrimination diagnosis.Therefore,the differential diagnosis of IPNs is still a challenging technical problem.The detection of volatile organic compounds exhaled,the characteristics of peripheral blood platelets,the spectroscopic features of serum Raman spectroscopy,and the peripheral blood T-cell receptor repertoire detection are four novel methods for the diagnosis of diseases.Early studies have shown them to have potential value for the discrimination diagnosis of IPNs.Aiming at the clinical and technical issues of IPNs discrimination diagnosis and the pre-research basis,this paper proposes a solution of machine learning aided several minimally invasive methods for differential diagnosis of indeterminate pulmonary nodules.The purpose of this study is to screen out diagnostic markers for malignant and benign IPNs,construct a classifier using machine learning methods to provide auxiliary means for the accurate identification of IPNs.In addition,an attempt is made to elucidate the clinical applicability of the newly constructed IPNs classifier,explore the biological basis of Raman spectroscopy detection,systematically analyze the motifs differences between benign and malignant IPNs TCRs,and provide an IPNs diagnostic model web server.The main research content and results of this doctoral dissertation are as follows:1.This dissertation completed VOCs detection in 338 IPNs patients and found that a combination of four substances,namely 5-Hepten-2-one,6-methyl,Butanol,Methylsulfide and Tetrachloroethylene,could be integrated by the gausspr Poly algorithm to accurately and stably diagnose IPNs,named Lung Voc Doc,with two independent validation set diagnosing AUCs of 0.6 and 0.72 respectively and a positive prediction value of 0.8 on all independent validation sets.2.This dissertation completed 419 IPNs patients peripheral blood platelet characteristics detection and conventional medical information mining,and found that the combination of five features including age,p PLT,p PCT,b PCT and pulmonary nodule diameter could be used to diagnose IPNs under XGBoost algorithm integration,which was named as the SCHC model and had an AUC of 0.72 in the internal validation group.The SCHC model performed well in IPNs diagnosis for individuals with age over 60 or male with pulmonary nodules of 20-30mm(AUC>0.8).However,the model showed poor generalization capability in the independent test datasets from other centers,which may be caused by the differences in extraction and detection instruments and provides a direction for further technological promotion.3.This dissertation completed the serum Raman spectra feature detection of 883 patients,of which 663 were IPNs patients.ANOVA test was used to screen for differential spectra features,and an SVM model(named Lung Ra Doc)based on these differential features was constructed to effectively differentiate benign group,malignant group,and healthy group.The diagnostic AUC of Lung Ra Doc for independent validation group in distinguishing IPNs was 0.89,and the positive predictive value was 0.93,indicating excellent and stable performance and potential in assisting diagnosing IPNs in clinical practice.The diagnostic efficiency of Lung Ra Doc was not affected by important clinical factors such as clinical staging,pathological type,lesion size,and the GGN of lesions.Furthermore,the higher the model’s predicted value,the higher the clinical stage,suggesting its application value in evaluation of disease malignancy and disease treatment monitoring.In addition,the method performed well in lesions smaller than 10 mm.Through proteomics,it was found that the effectiveness of Lung Ra Doc may be attributed to the cytoskeletal proteins in the serum,which are stable and thus not affected by sample storage time.4.This dissertation completed the TCR repertoire profiling of 109 patients,including99 IPNs patients.The proposed TCRnodseek,integrated with SVM algorithms,can accurately diagnose IPNs with an independent validation AUC of 0.80 and a positive predictive value of 0.95,providing a promising auxiliary tool for precise diagnosis of pulmonary nodules.This method has the characteristic of high interpretability with less required samples.5.This dissertation also constructed a web server including five IPNs prediction models which can be accessed online with assurance of data security and parallel access,providing necessary platform support for lung cancer prevention and control.In conclusion,this dissertation employed machine learning methods to assist the chemical-based volatile organic compound detection,hematology-based platelet feature detection,physics-based Raman spectroscopy detection,and immunology-based TCR immunology repertoire,etc.,to develop multiple models with potential clinical applications,among which Lung Ra Doc and TCRnodseek can solve the clinical problems of IPNs identification diagnosis better.It also revealed the preliminarily potential biological basis of Raman spectroscopy and TCR immunology repertoire detection,and suggested possible algorithms to construct exhaled breath volatile organic compound model.This study will provide reference and guidance for both basic research and clinical practice of IPNs.
Keywords/Search Tags:Indeterminate Pulmonary Nodules, Machine Learning, Raman Spectroscopy, Platelets, T-cell receptor repertoire
PDF Full Text Request
Related items