Font Size: a A A

Research On Risk Prediction Model Of Early Gastric Cancer Based On Data Mining

Posted on:2020-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:M M LiuFull Text:PDF
GTID:2404330590997683Subject:Public Health Informatics
Abstract/Summary:PDF Full Text Request
ObejectiveGastric cancer is a malignant tumor with high morbidity and mortality in China,but in the case of early gastric cancer patients undergoing immediate radical treatment,which is surgical resection,the5-year postoperative survival rate of them is high.Therefore,strengthening the diagnosis and screening of early gastric cancer is the key to saving the life of patients with gastric cancer and improving the life quality of them.At present,the detection rate of early gastric cancer in China is very low,which needs to be improved urgently.The previous methods to improve the detection rate of early gastric cancer are mostly improved clinical techniques,such as gastroscopy and pathological tissue biopsy,which are harmful to patients,resulting in low compliance and unpopularity of them.In conclusion,the detection rate of early gastric cancer in China is relatively low.The purpose of this study was to construct prediction models of the risk of early gastric cancer by data mining methods based on noninvasive factors,such as basic information,eating habits,recent main symptoms,histories of family familial and pastdiseases,and serological examinations of patients with gastric diseases.In addition,this study analyze the factors that had important influence on the risk prediction of early gastric cancer.The ultimate goal is to assist clinical screening the risk of early gastric cancer before invasive gastroscopy and pathological biopsy according to the results of this study,which can further assist in improving the detection rate of early gastric cancer.MethodsThe data of this study was obtained from a scientific research project in cooperation with the First Affiliated Hospital of Guangdong Pharmaceutical University.This project collected the results of questionnaire survey,serological examinations,gastroscopy and pathological tissue biopsy patients with gastric diseases who visited the department of gastroenterology of 30 medical institutions in Guangdong province.The contents of the questionnaire mainly involved in basic information,eating habits,recent main symptoms,histories of family familial and past diseases,and serological examinations of patients with gastric diseases.The results of gastroscopy and pathological tissue biopsy were the "gold standard" for the diagnosis of early gastric cancer,so the patients were classified into low risk of early gastric cancer and high risk of early gastric cancer in accordance with their results of endoscopy and pathological tissue biopsy.Taking the results of questionnaire andserological examinations as the predictors,the risk category of early gastric cancer as the target classification,this study preprocessed the collected data,including cleaning the original data,using the correlation analysis for screening the predictors that are related to the risk category of early gastric cancer,splitting the original data into training set(70%)and testing set(30%),and dealing with the problem of imbalanced classification on the training set by the synthetic minority oversampling technique(SMOTE).Four risk prediction models of early gastric cancer were established based on C5.0 decision tree(C5.0 DT),tree augmented Naive Bayesian network(TAN),multilayer perceptron(MLP)and support vector machine(SVM).And these four models calculated the importance of each relevant indicator affecting the risk prediction of early gastric cancer.What's more,the performance of these four models were evaluated with the confusion matrix,accuracy,sensitivity,the area under the receiver operator characteristic curve(AUC)value and gain charts.ResultsAfter cleaning the original data,this study included 798 patients with gastric diseases,and selected 24 predictors associated with the risk category of early gastric cancer.Four risk prediction models of early gastric cancer were established on the balanced training set based on data mining methods,including C5.0 DT,TAN,MLP and SVM.Theperformance of each model on the testing set was evaluated.Accuracy of these four models were similar.The C5.0 DT and TAN had the bigger AUC value and the better gains,they can be interpreted in clinical practice strongly.The TAN and MLP can more accurately predict patients with high risk of early gastric cancer.Using each index to evaluate the SVM,its performance is very poor.Therefore,the TAN established in this study is a relative optimal risk prediction model of early gastric cancer,followed by the C5.0 DT,the MLP,and the SVM.In addition,this study analyzed the importance of each relevant indicator affecting the risk prediction of early gastric cancer,which were calculated by three models with the better performance.Ten predictor that have important influence on the risk of early gastric cancer were selected,including past Barrett's esophagus,past atypical hyperplasia or epithelial neoplasia,past gastric ulcer,eating fruit frequently,past Hp infection and recent main symptoms of acid reflux and so on.ConclusionThis study integrated multiple noninvasive factors to establish four risk prediction models of early gastric cancer.Through a comparative evaluation the TAN was a relatively optimal risk prediction model of early gastric cancer.In addition,this study selected 10 indicators that had important influence on the risk prediction of early gastric cancer.This optimal model and important influence indicators can assist clinicians toquickly assess patients ' risk of early gastric cancer,and arouse the attention of doctors and patients greatly.Doctors confirm patients,who are with high risk of early gastric cancer predicted by this optimal model,by further gastroscopy and pathological tissue biopsy,this whole process forms a grading screening strategy for early gastric cancer.This strategy has low harm and high compliance,which is helpful to improve the detection rate of early gastric cancer in primary medical units universally.From the point of view of patients and healthy people,the result of this study can prompt they to improve living and eating habits,and regularly go to the medical units for medical examinations,so prevent and diagnose the occurrence of gastric cancer as soon as possible.This study may help clinical researchers in selecting and conducting the optimal risk prediction models of early gastric cancer,and assess important influence indicators on predicting the risk of early gastric cancer,to a great extent.
Keywords/Search Tags:Early Gastric Cancer, Risk Prediction, C5.0 Decision Tree, Tree Augmented Naive Bayesian Network, Multilayer Perceptron, Support Vector Machine, SMOTE
PDF Full Text Request
Related items