Font Size: a A A

Risk Prediction Of Liver Cirrhosis Complicated With Upper Gastrointestinal Bleeding Based On Random Forest

Posted on:2018-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2334330536974422Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:To construct a random forest prediction model for upper gastrointestinal bleeding in patients with liver cirrhosis,based on the clinical data of patients with liver cirrhosis.The probability of occurrence of upper gastrointestinal bleeding in patients with liver cirrhosis can be predicted by random forest model,and patients with cirrhosis who will occur upper gastrointestinal bleeding can be identified,then we can grasp the development trend of the disease.The incidence and fatality rate of liver cirrhosis patients complicated with upper gastrointestinal bleeding can be reduced by avoiding the incentives of upper gastrointestinal bleeding,as well as early intervention and preventive treatment.Methods:The medical records of cirrhosis patients who were admitted to The First Clinical Hospital in Shanxi Medical University,Department of Gastroenterology during January2008 to December 2016 were collected retrospectively.The medical records included basic information,past disease history,complications,clinical manifestations and signs of admission,admission blood routine,blood biochemistry,related antigen,coagulation function examination and other information.Chi square test and T test were used to screen the variables associated with upper gastrointestinal bleeding,and the variables for the performance of upper gastrointestinal bleeding were removed by clinical experts according to clinical practice.The data set is randomly divided into three parts: training data set,verification data set and test data set according to the ratio of 3:1:1.Logistic regression,decision tree and random forest prediction model were established in the training data set,the verification data set was used to compare the predictive performance of the different parameter setting decision tree and random forest models.Finally,the accuracy,sensitivity, specificity,positive predictive value,negative predictive value and AUC index of the three final models were evaluated and compared in the test data set.Results:Through variable screening,the variables for final modeling were Child-Pugh classification,nausea,ventosity,edema,ascites,shifting dullness,history of upper gastrointestinal bleeding,history of splenectomy,total protein,albumin,total bilirubin,Alkaline phosphatase,glutamyl transpeptidase,glucose,cholesterol,urea nitrogen,serum potassium,prothrombin percent activity,activated partial thromboplastin time,carcinoembryonic antigen and CA19-9 antigen,with total 21 variables.The results of logistic regression model in the test data set were as follows: accuracy 75.10%,sensitivity78.00%,specificity 74.10%,positive predictive value 52.00%,negative predictive value90.40%,AUC value 0.720.Validated by verification data set.The optimal parameters of decision tree model were set as follows: the information entropy(information)was used to select the metric for the split attribute,the post-pruning complexity parameter CP was0.026,the loss matrix was set to C(0,3,1,0).The results of the decision tree model in the test data set were as follows: accuracy 75.10%,sensitivity 78.00%,specificity 74.10%,positive predictive value 52.00%,negative predictive value 90.40%,AUC value 0.720.The random forest model is validated by validating the data set.The optimal parameters of random forest model were set as follows: the number of trees(ntree)was 500,and the number of randomly selected features(mtry)was 4.The accuracy of the random forest model in the test data set was 88.90%,the sensitivity was 64.00%,the specificity was97.80%,the positive predictive value was 91.40%,the negative predictive value was88.30%,and the AUC value was 0.909.By comparing the indexes and the ROC curve,the random forest model had the best predictive performance of upper gastrointestinal bleeding in cirrhosis.Conclusion:The random forest model is superior to the decision tree and the traditional logistic regression model in predicting upper gastrointestinal bleeding in patients with liver cirrhosis.The incidence probability of upper gastrointestinal bleeding in patients with liver cirrhosis can be predicted,based on history of disease,complications,clinical manifestations and signs of admission,admission blood routine,blood biochemistry,related antigen and coagulation function test of patients with liver cirrhosis.It can provide a reference for further intervention and preventive treatment.
Keywords/Search Tags:random forest, decision tree, liver cirrhosis complicated with upper gastrointestinal bleeding, disease risk prediction
PDF Full Text Request
Related items