| The morbidity and mortality of lung cancer are highest among all cancers in China,which seriously threatens the health of human and quality of life.It is difficult to prevent lung cancer because of its complicated etiology and unclear pathogenesis.Therefore,early detection,early diagnosis,and early treatment are still the key to improving the survival rate of the patients with lung cancer.Among the existing methods of lung cancer diagnosis,pathological diagnosis is the gold standard in the diagnosis of lung cancer,which has an important guiding significance for the selection of reasonable treatment plan.However,there are problems such as the shortage of pathologists,low diagnostic efficiency and subjectivity at present.Therefore,it is urgent to develop an intelligent auxiliary diagnostic method to improve the efficiency and accuracy of pathological diagnosis of lung cancer.ObjectivesIn order to realize the early detection and diagnosis of lung cancer,a risk prediction model for lung cancer would be established based on epidemiological characteristics and clinical symptoms by logistic regression,which could be used to screen the high-risk individuals of lung cancer.Based on the living tissues taken from the guidance of bronchoscopy or CT,the convolutional neural network would be applied to carry out histopathological intelligent diagnosis,and the high-precision classification and diagnosis model of non-small cell lung cancer would be established,which could improve the accurate diagnosis of lung cancer.From the preliminary screening of high-risk individuals with lung cancer,to the precise examination of suspected lung cancer patients,and then to the diagnosis of lung cancer patients,the progress would be made step by step,and the efficiency of screening and diagnosis of lung cancer would be improved.Materials and Methods1.Objects of studyThe epidemiological characteristics and clinical symptoms of 1302 samples were collected to construct the risk prediction model of lung cancer.Among them,there were405 patients with lung cancer,444 patients with benign lung disease,and 453 normal controls.Patients with lung cancer and lung benign diseases were from the Department of Respiratory Medicine of the First Affiliated Hospital of Zhengzhou University.The normal controls were from the normal individuals who had the physical examination.In addition,a total of 2380 whole slide images of lung cancer were collected in this study to construct the histological classification model of non-small cell lung cancer.Among them,1170 cases were lung adenocarcinoma and 1210 cases were lung squamous cell carcinoma.The whole slide images of lung adenocarcinoma and lung squamous cell carcinoma were obtained from the Department of Pathology of the First Affiliated Hospital of Zhengzhou University.2.Construction of the risk prediction model of lung cancerWhen being assigned values to the dependent variables,patients with lung cancer were set as the lung cancer group,while patients with benign lung disease and normal population were set as the non-lung cancer group.The 1302 samples were randomly divided into training set and test set in the ratio of 3:1.The epidemiological characteristics and clinical symptoms of the training set were employed to establish the risk prediction model of lung cancer by logistic regression,and the test set was used to verify the performance of the model.Then the high-risk individuals of lung cancer were screened out.3.Construction of the histological classification model of non-small cell lung cancerFirstly,a digital slice scanner was used to digitize the pathological section,and then the cancer area was manually marked by more than 3 experienced pathologists as the region of interest(ROI)in this study.Secondly,the ROI area was extracted through a two-layer for loop program in the computational environment of Python.The resolution of the extracted patch was 224×224.Thirdly,20000 lung adenocarcinoma patches and 20000 lung squamous cell carcinoma patches were obtained.The patches of lung cancer were divided into training set,validation set and test set in the ratio of7:2:1.Finally,based on Tensor Flow and Keras,the convolutional neural network technology was applied to establish the histological classification model of lung adenocarcinoma and lung squamous cell carcinoma,and the model evaluation was conducted.4.Evaluation of modelsIn this study,sensitivity,specificity,area under the curve(AUC)and accuracy were used as indicators for model evaluation.When the AUC value is closer to 1.0,it indicates that the prediction performance of the model is better.As the AUC value is less than or equal to 0.5,it indicates that the model has no diagnostic value.Results1.The risk prediction model of lung cancer(1)Statistical analysis of epidemiological characteristics and clinical symptomsBased on the statistical analysis of the data of 1302 samples,it was shown that there were significant differences in the 13 epidemiological characteristics and clinical symptoms including age,gender,smoking history,drinking history,history of lung infection,family history of cancer,family history of lung cancer,chest tightness or chest pain,blood in the sputum,expectoration,cough,fever or sweating,hemoptysis among the healthy,lung benign,and lung cancer group(P<0.05).(2)Establishment of the risk prediction model of lung cancer by logistic regressionNine risk factors including age,smoking history,history of lung infection,family history of cancer,chest tightness or chest pain,blood in the sputum,cough,fever or sweating,and hemoptysis were included in the risk prediction model of lung cancer.The sensitivity,specificity,accuracy and AUC(95%CI)of the training set were 86.7%,68.7%,74.3% and 0.821(0.797-0.845),respectively.The sensitivity,specificity,accuracy and AUC(95%CI)of the test set were 80.2%,62.8%,68.3% and 0.716(0.663-0.764),respectively.2.The histological classification model of non-small cell lung cancerThe number of neurons in the output layer of standard Efficient Net was modified to 2 to meet the binary classification of lung adenocarcinoma and lung squamous cell carcinoma.Then the network model was trained for 150 epochs using the RMSprop optimizer.The initial learning rate was set as 0.00002 and the batch size was set as 30.After 30 minutes of training,the training loss of the model reached a stable level,and the training of the model ended.At this point,the accuracy of the validation set was93.5%,and the accuracy of the test set was 92.4%.Then 1×1 convolutional layer was added on the basis of the Efficient Net network structure to reduce the dimension by gradient,so as to reduce the redundant features and improve the convergence speed and classification accuracy to some extent.The Efficient Net network model was optimized in the order of adding 1×1 convolutional layer,selecting the optimizer,and adjusting the dropout,learning rate,and batch size.According to the optimization results,the Efficient Net network model finally added two layers of 1×1 convolution,the Adam was selected as optimizer,the dropout was set as 0.5,the learning rate was set as 0.00002,and the batch size was set as 30.In the case of non-cross-validation,the accuracy of the validation set was 97.6%,and the accuracy of the test set was 96.8%.In the case of five-fold cross-validation,the prediction accuracies of this model for lung adenocarcinoma and lung squamous cell carcinoma were 96.8%,97.0%,96.9%,96.8%and 97.0%,respectively.The average accuracy of the test set was 96.9%.ConclusionThe risk prediction model of lung cancer was established by logistic regression based on epidemiological characteristics and clinical symptoms.The accuracies of the training set and the test set were 74.3% and 68.3%,respectively.This model could be applied to screen out the high-risk individuals of lung cancer.Based on histopathological images of non-small cell lung cancer,the classification model of lung adenocarcinoma and lung squamous cell carcinoma was constructed by convolutional neural network.The accuracies of the validation set and the test set were 97.6% and96.8%,respectively.This model could effectively identify the histopathological images of non-small cell lung cancer.From the preliminary screening to the diagnosis of lung cancer,the progress would be made step by step,which would improve the efficiency of screening and diagnosis. |