Font Size: a A A

Risk Assessment And Diagnosis And Different Tissue Type Of Lung Cancer Based On Data Mining Technology

Posted on:2020-06-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X WangFull Text:PDF
GTID:1364330575451531Subject:Internal Medicine
Abstract/Summary:PDF Full Text Request
Lung cancer is a malignant disease with complex etiology.It may involve genetic factors,chronic infection,environmental pollution,and lifestyle.Its morbidity and mortality have always been high,and it is one of the main malignant tumors in the world.So it is a major public health problem that seriously threatens human health.In China,according to the latest statistics,the incidence of lung cancer in 2015is much higher than other malignant tumors,accounting for 20.03%of new cases of malignant tumors nationwide,and the number of cases is about 790,000.Lung cancer ranks first among the incidence of male malignancies.Lung cancer is the leading cause of mortality among malignant tumors among men and women.Therefore,the early diagnosis of lung cancer has become a frontier topic in the world,and various disciplines are competing to study.Public health direction by studying the epidemiological characteristics of lung cancer,it is found that smoking history,gender,age,and heredity are all related factors affecting the incidence of lung cancer,and can be used as a reference indicator for clinicians to diagnose lung cancer.Smoking is one of the main causes of lung cancer.The international differences in the incidence and trends of lung cancer largely reflect the differences in the stage and extent of tobacco epidemics.In the United States,the United Kingdom and Denmark and other Western countries,the tobacco epidemic predated other countries.The incidence of lung cancer reached its peak in the middle of the last century.Since then,the mortality and morbidity rate of male lung cancer has gradually begun to decline,while women have been in a relatively stable state.In countries with late tobacco epidemics,such as China and some countries in Indonesia and Africa,the incidence of lung cancer is increasing.In Chinese expert consensus,Low-Dose CT?LDCT?have be recommended to screen high-risk groups of experts,smoking?20 packs/year,including those who used to smoke,but quit smoking for less than 15 years and passive smokers included in lung cancer high-risk groups.The research direction of imaging research is mainly to develop more accurate and accurate diagnostic instruments for visually visible lung cancer lesions,such as LDCT,MRI and PET-CT.Especially in recent years,imaging omics has developed imaging to genetic diagnosis of lung cancer.Because CT is sensitive to lung lesions and relatively inexpensive,lung cancer imaging is still mainly CT,of which LDCT is used for high-risk population screening,Multi-slice CT?MSCT?and High-resolution CT?HRCT?multi-purpose for diagnosis and observation follow-up.With the increase in the use of LDCT and HRCT,the detection rate of pulmonary nodules increased.The imaging features of pulmonary nodules are often characterized by spicule sign,lobulated signs,pleural indentation,vessel convergence sign ground-glass opacity,acanthoid processus,vacuole sign and air bronchogram sign.The single imaging feature is difficult to diagnose lung cancer,so a combination of multiple imaging features is needed.Currently,the molecular level techniques currently used in clinical practice,include serum tumor markers,sputum cytology,and genetic testing.Serum tumor markers are often used for the diagnosis,treatment observation and prognosis of tumors.However,because the sensitivity and specificity of single tumor markers for diagnosis of lung cancer are not high,they are not recommended as a tool for early detection or screening of lung cancer.People often use a combination of multiple tumor markers to improve the sensitivity of detecting lung cancer.The clinical diagnosis of lung cancer often requires the combination of epidemiology,clinical symptoms,serum tumor markers and imaging features to improve the diagnostic accuracy.It can be seen that in the diagnosis process of lung cancer,data materials with a large number,a wide variety and some information fuzzy or uncertain characteristics are generated,how to make better use of complicated medical data,and to find the law and screen out the important indicators of lung cancer diagnosis and treatment,it is urgent for scientific researchers in various disciplines to use new technologies to achieve early diagnosis of lung cancer.Compared with the traditional data statistics method,data mining technology has lower requirements on data types,and it can realize the valuable value and information hidden in the data from a large number of random,fuzzy,incomplete and noisy data..A variety of data mining calculations were used to establish a model for risk assessment,diagnosis and pathological tissue typing diagnosis of lung cancer.The superiority model was compared by comparison to provide a reference for early diagnosis of lung cancer.ObjectiveThe lung cancer risk assessment system is established by using the collected epidemiological and clinical symptom data of lung cancer,combined with logistic regression method and data mining technology to.On the basis of which parameters were added,microscopic molecular markers of serum tumor markers and macroscopic imaging of lung CT image features,multi-dimensional data combined with data mining technology to establish a diagnosis model for lung cancer diagnosis and lung cancer tissue classification.And the genetic algorithm is used to filter the variables,and the decision tree C5.0 model is established to improve the efficiency and cost of the model.In order to improve the diagnostic accuracy of lung cancer diagnosis and tissue typing by establishing a dominant lung cancer diagnosis model and a lung cancer tissue classification diagnosis model.A step-by-step screening and progressive diagnostic system for screening high-risk populations,identifying benign and malignant lung diseases,and lung cancer tissue typing diagnosis is used to assist clinicians in making decisions.Methods1.From October 2014 to October 2016,data was taken from the Department of Respiratory Medicine of the First Affiliated Hospital of Zhengzhou University.After informed consent,the epidemiological and clinical symptom data of patients with benign lung diseases and lung cancer were collected by professional investigators.The blood samples were collected,and the lung CT images of the patients were collected in the radiology department.The data of the normal control group were taken from the physical examination department of the First Affiliated Hospital of Zhengzhou University.2.Detection of serum tumor markers:The concentration of human vascular endothelial growth factor?VEGF?and human gastrin releasing peptide precursor?ProGRP?in serum was detected by enzyme-linked immunosorbent assay kit,and serum was detected by chemiluminescence immunoassay kit.Concentrations of cytokeratin 19 fragment?CYFRA21-1?,carcinoembryonic antigen?CEA?,and neuron specific enolase?NSE?.3.Quantitative scores of CT imaging physicians in the lungs:Three radiologists with intermediate or higher titles independently based on the lung CT quantitative score scale for lesion size?diameter?,density,marginal condition,location,hole sign,burr sign,Vascular notch,lobulated sign,spinous process,ground-glass changes,pleuralindentation,pleuralinfiltration,mediastinalshift,mediastinal lymphadenopathy,intrapulmonary metastasis,emphysema,calcification,tracheal stenosis,satellite lesions,lung There were 21 features in the astigmatism and pleural effusion.In order to reduce the empirical difference,the three groups of data were combined,the categorical variables were taken as the mode,and the continuous variables were taken as the mean.Among them,the lung cancer diagnosis model included small lesions with a diameter less than or equal to 3 cm after screening,density,marginal condition,location,hole sign,cavication,spicule sign,vascular notch,lobulated sign,acanthoid processus,ground-glass opacity,pleural indentation,mediastinal shift,intrapulmonary metastasis,emphysema,calcification,tracheal stenosis,pleurals infiltrations,and satellite lesions.18 features were analyzed.Lung cancer tissue typing diagnosis model uses lesion size,density,marginal condition,hole sign,cavication,spicule sign,vascular notch,lobulated sign,acanthoid processus ground-glass opacity,pleural indentation,pleural infiltration,mediastinal shift,intrapulmonary emphysema,calcification,tracheal stenosis,and satellite lesions.A total of 20 features of metastasis were analyzed.4.Establishment of the model:Using the epidemiological and clinical symptom data to establish a multi-class Logistic regression model for lung cancer risk assessment system;the samples were randomly divided into training set and prediction set by 3:1 ratio,respectively,using artificial neural network?ANN?,decision tree C5.0 and support vector machine?SVM?model,combined with the diagnostic test evaluation indicators to compare the prediction performance of each model;use genetic algorithm?GA?to optimize the screening variables,and combined with decision tree C5.0 to build the model,Evaluate the optimization effect.Logistic regression is run under SPSS 21.0 software.Artificial neural network,decision tree C5.0 and support vector machine are run under SPSS Clementine 12.0 software,and genetic algorithm runs under Matlab14a software.5.Statistical analysis method:The measurement data conforming to the normal distributionX±S.The comparison between groups was performed by two independent samples t test.M?P25P75?was used when the normal distribution was not used,and the non-parametric test was used for comparison between groups.The comparison between the groups was performed by?2 test;The data were analyzed by Fisher's exact probability method with?=0.05 as the test level,and the above statistical analysis was performed using SPSS 21.0 software.The area under the receiver operating characteristic curve?AUC?was calculated under MedCalc V11.6.0.0 software,and the predictive power of each model was evaluated.Results1.Lung cancer risk assessment system:Comparison of the results of Logistic regression,ANN and decision tree C5.0 established by 14 variables including epidemiology and clinical symptoms.The recognition ability of the three models to the normal group is all high,the difference was not statistically significant?P=0.057?;there was no significant difference in the ability of ANN and decision tree C5.0 to identify the benign lung group?P=0.643?;both models had high recognition ability in the benign lung group.Logistic regression model?P=0.003,P=0.001?;the prediction results of the three models for lung cancer group showed that the decision tree C5.0results were better than Logistic regression and ANN model,and the differences were statistically significant?both P<0.001?,but the difference between Logistic regression and ANN model was not statistically significant?P=0.769?.2.Lung cancer diagnosis system:The eighteen imaging features of lung CT,epidemiology and clinical symptoms composed of the first group of data,imaging features,epidemiology and clinical symptoms plus serum tumor markers data to form a second set of data,the lung cancer diagnosis model was established by using ANN,decision tree C5.0 and SVM,and the lung cancer diagnosis model established after genetic algorithm?GA?optimization was compared.The difference was not statistically significant?all P>0.05?;the second group of data established SVM2-2.The accuracy of the model is 100%,which can identify the benign lung disease and lung cancer to the greatest extent.The C5.02-1 model established by the first group of data has an AUC of 0.918.The first group of data is optimized by genetic algorithm.The rate was slightly reduced,but the difference was not statistically significant compared with C5.02-1?P=0.3936?.After the second group of data optimization,the accuracy and sensitivity are increased,indicating that the genetic algorithm can reduce the variables,save resources,and improve the model's ability to identify lung cancer.The study found that multi-classification of categorical variables can improve the prediction accuracy of the model better than simple two-class classification.3.Lung cancer typing diagnostic system:The twenty imaging features of lung CT composed of the first set of data,imaging features,epidemiology and clinical symptoms composed of the second set of data,imaging features,epidemiology and clinical symptoms combined with the third group of serum tumor markers,the ANNand SVM lung cancer tissue typing models were established and compared with each other.The differences were not statistically significant?all P>0.05?,and the third group of data was established.The SVM3-3 model has the highest accuracy and AUC,and the ANN3-2 model established in the second set of data has the highest sensitivity and can detect lung adenocarcinoma to the greatest extent.The ANN3-1model established by the first set of data,the SVM3-2 model established by the second set of data,and the ANN3-3 and SVM3-3 models established by the third set of data all have a specificity of 91.67%.Squamous cell carcinoma has a high recognition ability.The third set of data was optimized using genetic algorithm to establish the decision tree C5.03-3 model,and compared with the svm3-3 model,the difference was statistically significant?P=0.0128?,although the decision tree C5.03-3 accuracy of the prediction set is low,but the accuracy of the discriminant analysis of the training set is not reduced,reaching 98.61%,and its sensitivity is 87.50%,which has certain recognition ability for lung adenocarcinoma.Therefore,the combination of multiple models can improve the prediction accuracy of the lung cancer typing diagnosis model.The results showed that the multi-dimensional model incorporating the combined use of serum tumor markers was superior to the simple imaging and imaging combined with epidemiology and clinical symptoms of lung cancer tissue typing.Conclusion1.The decision tree C5.0 model of the three lung cancer diagnosis system models constructed has the highest screening accuracy for lung cancer,and the data mining technology ANN and decision tree C5.0 model are better than Logistic regression model for the recognition of benign lung disease.2.Epidemiological characteristics,serum tumor markers and lung CT manifestations are difficult to be used to identify the diagnosis of benign and malignant diseases alone.The lung cancer diagnosis system established by combining and using data mining technology can reduce variables,save resources and improve diagnosis.Rate,reduce the rate of misdiagnosis,provide diagnostic ideas for clinicians,and minimize the occurrence of misdiagnosis.3.The lung cancer tissue typing diagnosis model established by the combination of multiple data mining techniques can improve the prediction accuracy rate;it is expected to become a new model for lung cancer tissue typing diagnosis,and provide tissue points for patients who cannot tolerate biopsy or biopsy contraindications.Non-invasive method of type diagnosis.
Keywords/Search Tags:lung cancer, data mining technology, tumor markers, computed tomography, joint diagnosis
PDF Full Text Request
Related items