Font Size: a A A

Study On The Infectious Regularity Of Patients With Advanced Lung Cancer Based On Data Mining

Posted on:2018-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:2348330515466783Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
For a long time,cancer has been an important threat to human health.The growth in morbidity and mortality of lung cancer was ranked in the forefront in all cases of cancer statistics.Patients with lung cancer generally have concomitant bacterial or fungal infections.Depending on incomplete clinical statistics,the vast majority of lung cancer patients eventually died of pathogen infection,and the causes of death were mostly cancer complications,only a few patients died of cancer itself.Diagnosis of the original type of infectious pathogen is critical to the treatment of infection complications.At the present stage,germs in blood or tissue were targetedly checked mainly via bacteriological examination clinically,for example,the bacteriological examination of the specimens of sputum and lower respiratory tract secretions,the specimens of blood and bone marrow and the urine specimens,etc.Lung cancer patients usually receive bacteriological testing of the specimens of sputum and lower respiratory tract secretions,and the test cycle is generally up to seven days.In this paper,we will use data mining technique to analyze the clinical variables of patients with advanced lung cancer and establish the model between the clinical variables and pulmonary infectious pathogens(Klebsiella and Candida).Pulmonary infectious pathogen was predicted by data model.For this purpose,we collected 370 cases,including 222 patients with Candida and 148 patients with Klebsiella.Each case contains 21 clinical indicators.In this paper,the problem of missing data was treated via simply deleting the missing samples.Unbalanced problems of data classifications were solved by putting back over-sampling for the samples of minority class,and sampling data for most under-sampling data.Feature extracted via PCA,feature selected via RF-RFE,the random forest was used as a classifier to establish a classification model.Ten-fold cross validation approach was used to estimate the generalization performance of the classification model.Specificity,sensitivity,positive predictive value and negative predictive value are used as the evaluation criteria.The ROC curve was used to visualize the generalization ability.The generalization ability of the model is quantified by AUC.At the same time,this paper also set up a contrast experiment for data sampling,classifier,feature selection and extraction.The experimental results of this paper are as follows :(1)Positivity for Klebsiella,the higher negative predictive value and sensitivity of the model output.(2)At the same time,a small number of samples were simply put back over-sampling,the majority of samples under-sampling simultaneously were sampled,in the face of this type of imbalance data,which achieved good results and improved The generalization ability of the classifier greatly.(3)Random forest and support vector machine were provided with close generalization ability in the classification problem studied in this paper.(4)PCA,ICA and other feature extraction methods improved the generalization ability of the classifier significantly.
Keywords/Search Tags:Lung cancer and infection, Data Mining, Data Preprocessing, Random Forest
PDF Full Text Request
Related items