Font Size: a A A

Risk Prediction Model Of Esophageal Squamous Cell Carcinoma Based On Cancer Screening Cohort

Posted on:2022-08-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M HanFull Text:PDF
GTID:1484306608479774Subject:Cell biology
Abstract/Summary:PDF Full Text Request
BackgroundGlobally,esophageal cancer ranks eighth among common malignant tumors and sixth among cancer-related cause of death.Esophageal cancer is a high fatal disease and rarely has symptoms in the early stage.When esophageal cancer is detected due to symptoms,most patients are already at advanced stage,while the clinical treatment of cancer in the advanced stage is less effective and costly.Despite the great progress in the relevant treatment therapy,the 5-year survival rate of esophageal cancer is still poor even in the developed counties,such as in the United Kingdom,the 5-year survival rate is less than 20%,which highlights the importance of early detection for esophageal cancer.There are two main histological subtypes of esophageal cancer esophageal squamous cell carcinoma(ESCC)and esophageal adenocarcinoma(EAC).EAC and ESCC differ in the molecular mechanisms.EAC is more common in western countries,with a rising incidence rate.ESCC accounts for about 90%of esophageal cancer cases and is more common in East Africa,Central Asia and China.Although it had been observed that the incidence rate of ESCC declined slightly,it is still at a fairly high level.Endoscopic examination is one of the main inspection modalities for detecting esophageal cancer and precursor lesions,and it is also widely applied for esophageal cancer screening programs.In 2005,China issued guidelines for early diagnosis and treatment of cancer,including guidelines for early detection and treatment of esophageal cancer.The guideline states that esophageal cancer screening is suitable in rural and urban high-risk areas.Rural residents in high-risk aeras aged 40 to 69 years participate in the cancer screening at the national esophageal cancer early diagnosis and treatment base.Esophageal cancer screening was performed by endoscopy combined with iodine staining and indicator biopsy.Then,the early diagnosis and early treatment project for upper digestive tract cancer screening began.According to the screening program,in the initial stage of screening,screening efficiency is quite low due to lack of assessment tools for selecting high-risk population.During the follow-up stage,the treatment or follow-up suggestions are mainly for people diagnosed with dysplasia or above while other groups have no corresponding follow-up or health guidance suggestions,and corresponding assessment tools are also rare.In addition,the future development direction of esophageal cancer screening is also precision,individualization and risk predictionbased in which the prediction model plays an important role.Existing prediction models have many defects:prediction models based on case-control studies are at low evidence quality grade;cohort-based models rarely carried out external validation and its effectiveness in other populations was unknown;prediction models were based on questionnaire data and did not consider endoscopic related risk factors.Therefore,this paper aims to develop and validate ESCC prediction models for the two stages of screening and optimize the esophageal cancer screening program.With increasing knowledge of potential prognostic predictors,as well as the advent of personalized medicine,the role of prediction models in population risk stratification and clinical decision-making is increasing.However,most of prediction models cannot be applied to clinical practice due to methodological defects or other problems.In order to improve the application of the model,it is very important to develop and validate the prediction model properly.There are no optimal methods to establish prediction models.therefore,we explore and propose a standard process for developing clinical prediction models.Objective(1)To develop and validate an ESCC prediction model to identify high-risk population in the primary stage before mass endoscopic examination and to improve the detection rate and screening efficiency.(2)To investigate the associations between the esophageal lesions characteristics and the risk of ESCC.(3)To develop and validate ESCC prediction models to identify high-risk populations for follow-up in the follow-up stage.(4)To explore the standard process of developing prediction models,and provide reference for researchers to establish and validate prediction models;MethodsIn this study,the participants involved in the upper digestive tract cancer screening cohort in Shandong Province from 2006-2019 were included.The data we collected included:basic information,endoscopic examination data,pathological diagnosis data,cancer registration data,cause of death registration data.Question defining.Develop and validate of ESCC prediction models for different stages of the screening program.Study design.ESCC prediction models were based on prospective cohort design,while screening data of Feicheng city was collected as the training cohort and screening data from other regions in Shandong Province formed the validation cohort.The population included in the finical analysis was selected according to the inclusion and exclusion criteria.Data consolidation.Consolidate raw data,code the data,clean the outliers,form five data sets using multiple imputation methods.Develop prediction model.Candidate predictors included age,gender,annual household income,BMI,source of water,smoking status,drinking status,tea consumption,fresh fruit consumption,fried food consumption,hot food consumption,digestive disease history,family history of any cancer,number of lesions,lesion size,highest pathological diagnosis results.Pairwise interactions between variables in multiple regression were analyzed,and interaction terms satisfying significance levels less than 0.001 were included in the finical model.Variables were selected using the LASSO method.The ESCC prediction model was established using the classic Cox regression model.Schoenfeld residue test was used to test proportional hazard hypothesis and models were fitted on 5 data sets respectively and the results were merged using the Rubin's rule.Model evaluation.Evaluate ESCC prediction models using internal and external validation.Evaluation indicators included the variance explained by the model(R2),Harrell's C statistic and D statistic,calibration curve and decision curve.All the indexes were calculated in the 5 datasets and the results were merged using the Rubin's rule.Model presentation and report.Convert the model into a score model forclinical use.The transformation method is:with the minimum coefficient in the multiple regression model as the reference,the other coefficients were divided by the reference value and obtained the integer of each index as the score for each predictor.Add the scores of multiple predictors to obtain the total score,and the score model was developed by calculating the corresponding risk.Evaluate the score models using the proportion of high-risk population,sensitivity,specificity,Youden's index,accuracy rate,positive predictive value(PPV),negative predictive value(NPV),positive likelihood ratio(+LR),negative likelihood ratio(-LR)and number needed to be screened(NNBS).Model reporting was complied with the TRIPOD statement.Quality control and update of the model.Calculate the calibration bias of ESCC prediction models,and set a significance level;Update the model when the significance level was reached.ResultsCohort description.The training cohort came from screening data of Feicheng from 2006 to 2019.After excluding 18483 participants with duplicated ID number,78 with wrong ID number,1 with wrong survey date,6999 with unfilled questionnaires,1084 not aged 40-69 years and 1559 with pathological diagnosis of severe dysplasia or above lesions or cancer at baseline,a total of 59481 participants were included in the training cohort.The median follow-up time of the training cohort was 5.76 years.After 368426.00 person-years follow-up,227 new cases of ESCC were diagnosed and the incidence rate was 61.61 per 100000 person-years.After 215969.40 person-years follow-up,70 new cases of ESCC were diagnosed in women with an incidence rate of 32.41 per 100000 person-years.After 152456.60 person-years follow-up in men,157 new cases of ESCC occurred with an incidence rate of 102.98 per 100000 personyears.After excluding 10690 participants with duplicated ID number,40 with wrong ID number 104 with wrong survey date,3776 with unfilled questionnaires,7893 without endoscopic examination records,4312 not aged 40-69 years and 2064 with pathological diagnosis of severe dysplasia or above lesions or cancer at baseline,a total of 44648 participants were included in the validation cohort.The median followup time of the validation cohort was 2.92 years.After 129838.30 person-years followup,55 new cases of ESCC occurred and the incidence rate was 42.36 per 100000 person-years.After 60173.20 person-years follow-up,49 new cases of ESCC occurred in men and the incidence rate was 81.43 per 100000 person-years.After 69665.13 person-years follow-up,there were 6 new cases of ESCC with an incidence rate of 8.61 per 100000 person-years.The risk prediction model for ESCC used in the primary screening stage-Model 1.The variables included in the model after variable selecting contained age,gender,annual household income,BMI,smoking status,drinking status,fresh fruit consumption,digestive disease history and family history of any cancer.The values of the R2,Harrell's C statistic,and D statistics in the training cohort were 40.06%,0.75 and 1.36,respectively;The values of the R2,Harrell's C,and D statistics in the validation cohort were 45.33%,0.83 and 1.86,respectively.The model had a calibration slope of 1.01 and an intercept of 0 in the training cohort and a calibration slope of 1.64 with an intercept of-0.38 in the validation cohort.The model had good discrimination and calibration capabilities and the decision curve showed that the model had clinical usefulness.The model performed well in sensitivity analysis.The model was transformed into a score model:age(45-49:9 points,50-54:14 points,5559:15 points,60-64:16 points;65-69:18 points),gender(male:6 points),smoking status(Yes:1 point),drinking status(Yes:2 points),BMI(<25:3 points),annual household income(low level:1 point),fresh fruit consumption(low intake level:2 points),pickled food consumption(high intake level:2 points)and history of digestive diseases(Yes:2 points).The score model still has good discriminative ability.Set the threshold to 22 or 23 when the Youden's index was larger.Approximately 190 people needed to be screened to detect 1 case of ESCC.The association between the esophageal lesion characteristics and the risk of ESCC.The training cohort was used to analyze the association between the characteristics of esophageal lesion and the risk of ESCC.The association between the lesion size and the risk of ESCC:the analysis of the overall population,different genders,and different ages all showed that the relative risk of ESCC in the group with lesion size<1 cm was less than that in the group with lesion size? 1 cm.The association between the highest pathological diagnosis and the risk of ESCC:the log rank test showed that there was no statistically significance between the cumulative incidence curve of normal group and the esophagitis group(p=0.93).The results of multiple regression analysis showed that in the overall population,different genders,and different age groups,the confidence interval of adjusted hazard ratio(HR)of the esophagitis group contained 1 compared with the normal group.In the overall population,compared with the normal group,HR and its 95%confidence interval(95%CI)of the mild dysplasia group was 3.86(95%CI,2.57-5.79),and the moderate dysplasia group was 29.45(95%CI,20.25-42.83).The results of different genders and different age groups were consistent,and the relative risk of ESCC was significantly increased in the moderate dysplasia group.The association between the number of lesions and the risk of ESCC:in the overall population,the adjusted HRs for the number of lesions 1,2,?3 were 2.72(95%CI,1.95-3.80),7.25(95%CI,4.89-10.75),and 15.36(95%CI,9.88-23.90),respectively.The analysis in different genders and different age subgroups showed that as the number of lesions increased,the relative risk of ESCC increased.The risk prediction model for ESCC used in the follow-up stage-Model 2 and Model 3.Variables in Model 2 included:age,sex and number of lesions.Variables Model 3 included:age,sex,smoking status,drinking status,BMI,pickled food consumption,number of lesions,lesion size,highest pathological diagnosis.The values of R2,Harrell's C statistic and D statistic in the training cohort of Model 2 were 45.35%,0.81,and 1.86 respectively;the values of R2,Harrell's C statistic and D statistic in the validation cohort were 61.73%,0.89,and 2.60 respectively.The values of R2 Harrell's C statistic and D statistic in the training cohort of Model 3 were 50.75%,0.83 and 2.08 respectively;the values of R2,Harrell's C statistic and D statistic in the validation cohort were 63.47%,0.89,and 2.70 respectively.Model 2 and Model 3 had good discrimination ability in the training cohort and the validation cohort.The intercept and slope of the calibration curve of Model 2 in the training cohort were 0 and 1.00,and the intercept and slope of the calibration curve of Model 2 in the validation cohort were 0.28 and 1.67.The intercept and slope of the calibration curve in Model 3 were 0 and 1.00 in the training cohort,and the intercept and slope of the calibration curve in Model 3 were 0.10 and 1.51 in the validation cohort.It showed good coherence between predicted probabilities and observed probabilities.The decision curve showed that the model had good clinical usefulness.Sensitivity analysis shows that the models were robust.Model 3 was transformed into a score model:age(45-49:6 points,50-64:9 points,65-69:10 points),gender(male:3 points),smoking status(Yes:1 point),drinking status(Yes:1 point),BMI(BMI<25:2 point),pickled food consumption(high intake level:2 points),number of lesions(1:2 points,2:6 points,?3:9 points),lesion size(?lcm:3 points),highest pathological diagnosis(mild or moderate dysplasia:6 points).Standard process for developing prediction models.(1)Question defining.Be clear of predictors and outcome.(2)Study Design.Comprehensively considered the advantages and disadvantages of different types of research designs and determined the research design in combination with the actual situation.(3)Data consolidation.The purpose of the data consolidation was to generate datasets used to develop prediction models.(4)Develop models.Selection of candidate variables to be studied,selection of models,and the selecting predictors were all important.(5)Model validation.Model validation included both internal and external validation.Evaluation index included:R2,discrimination ability index,calibration ability index,calibration curve,intercept and slope of calibration curve,decision curve,sensitivity analysis,etc.(6)Model presentation and report.The presentation mode of prediction model should be selected according to the user,including formula,score model,nomogram,etc.(7)Quality control of the model.As new endings arise,the calibration bias occurred.The concept of quality monitoring was considered to regularly evaluate the accuracy of the model.(8)Update of the model.Update the model when calibration bias was significant.Conclusion1.A prediction model of ESCC was developed and validated for primary screening.The model contained risk factors including age,gender,annual household income,smoking status,drinking status,BMI,fresh fruit consumption,pickled food consumption and digestive diseases history.The model had good discriminative ability,calibration ability and clinical usefulness in both training cohort and validation cohort,and can be applied to the primary screening for identifying high-risk population.2.The lesion characteristics had association with the risk of ESCC.The size of the lesion was associated with ESCC.The larger the lesion was,the higher risk of ESCC.The highest pathological diagnosis was associated with ESCC,while the association of esophagitis and ESCC was not statistically significant;and the dysplasia group had an increased risk of ESCC.The number of lesions was associated with ESCC.More number of lesions was associated with higher risk of ESCC.3.Prediction models for ESCC were developed and validated for the follow-up stage.Two prediction models:Mode1 2 and Model 3 were established.Risk factors in Model 2 included:age,gender,and number of lesions.Risk factors in Model 3 included:age,gender,BMI,smoking status,drinking status,pickled food consumption,lesions size,and highest pathological diagnosis and number of lesions.Both models had good discrimination and calibration ability and clinical usefulness in training cohort and validation cohort,and can be applied to the follow-up decision-making.4.Constructed standard steps for developing prediction models:question defining,study design,data consolidation,develop model,model evaluation,model presentation and reporting,quality control of the model and update of the model.
Keywords/Search Tags:screening, esophageal squamous cell carcinoma, endoscopy examination, prediction model
PDF Full Text Request
Related items