Font Size: a A A

Application Of Data Mining Predictive Models In Screening Lung Cancer High Risk Individual Among Coke Oven Workers

Posted on:2015-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Q D HeFull Text:PDF
GTID:2284330431996200Subject:Public health
Abstract/Summary:PDF Full Text Request
ObjectiveLung cancer has become a serious threat to human health, and also become apublic health problem, because its incidence and mortality rates have been increasingyear by year. Data mining technology has been widely studied in the medical field,because its advantage is obvious in solving large sample and multi-parameterproblem. In recent years, we have been working on the diagnosis of lung cancer, andpreliminary study has shown that serum carcinoembryonic antigen (CEA),neuron-specific enolase (NSE), gastrin, sialic acid (SA), copper-zinc ratio (Cu/Zn)and serum calcium are with high specificity for lung cancer, and artificial neuralnetwork (ANN) model based on six tumor markers have been established in theauxiliary diagnosis of lung cancer. In this study, the possibilities of these six tumormarkers will be explored as early effective indexes of coke oven workers’ health cost,and the ANN model will be developed and compared with the C5.0decision treemodel and support vector machine (SVM) model simultaneously. We try to screen thehigh-risk individuals from coke oven workers with these models, and establish thequeue for future study.Materials and methodsA group of111healthy individuals without previous history of cancer wererecruited consecutively from the First Affiliated Hospital of Zhengzhou University.And a group of183coke oven worker was recruited from Anyang Iron and Steel Co.,Ltd. The samples of modeling were from Preliminary work.The levels of CEA, NSE and gastrin in serum were test by radioimmunoassay,the copper and zinc in serum were evaluated by atomic absorption spectrophotometry,and serum calcium was measured by a fully automated analyzer, while sialic acid wastested by resorcinol chromogenic method improved by our research group.The lung cancer patients, lung benign patients and healthy controls wererandomly divided into train set and test set at the proportion of3:1. ANN, SVM and C5.0algorithm were used to establish classification model by use of training data,then test set data were classified by the models and compared with all the models.ANN, SVM and C5.0were performed on the environment of SPSS Clementine12.0.All statistical analyses were performed using the SPSS21.0statistical packagefor Windows. Methods of representation and examination were based on thedistribution of quantitative data. And the level of statistical significance was set at α=0.05.Results1. The levels of CEA, Cu/Zn in occupational group were significantly higherthan those in the control group, there were statistically significant (P<0.05), whilecalcium level in serum was lower in occupational groups (P<0.05). The level of CEAin the group who worked more than16years was higher than that of less than16years seniority, which indicates that seniority impact on the level of CEA (P<0.05),and there were no differences among working at the oven top, oven side and ovenbottom.2. The sensitivity, specificity and accuracy of ANN model were90.91%,97.92%and93.81%, respectively. And the results indicated the reproducibility of ANN modelwas well. The sensitivity, specificity, accuracy of C5.0were93.94%,91.67%,88.8%,respectively. While SVM model were90.91%,93.75%and91.36%, respectively. Theareas under the ROC curve (AUC) and its confidence interval of ANN, C5.0andSVM model its confidence interval were0.969(0.916-1.000),0.944(0.892-0.996)and0.947(0.897-0.997), respectively. But the difference of AUC was not statisticallysignificant (P>0.05).3. The results of data mining models: No.01252was classified into lung cancerby ANN model. No.01085and No.01239samples were classified into lung cancerby SVM model. No.01073, No.01144, No.01145, No.01178, No.01238and No.01239were classified into lung cancer by C5.0model. But the examination resultshave not yet been confirmed, they are recommended as the focus of the researchgroup observed.Conclusion1The level of CEA in serum was higher in occupational group, and increased with working time for coke oven workers. These indicate that CEA can be used asearly indicators of health effects of damage for coke oven workers.2Artificial neural networks, decision trees and support vector machine modelscombined with six tumor markers in auxiliary diagnosis of lung cancer, todifferentiate lung cancer from lung benign disease and normal control, can be used toscreen high-risk individuals in coke oven workers, and provide a basis for furtherstudy of the research group.
Keywords/Search Tags:coke oven workers, tumor markers, data mining, early diagnosis
PDF Full Text Request
Related items