Chronic obstructive pulmonary disease(COPD)is a common chronic disease with a high prevalence,which seriously damages national health and has a heavy disease burden.It has become one of the major public health problems affecting my country’s economic and social development.Recurrent,unplanned readmissions,where some patients return to the hospital within 30 days of discharge and are readmitted for the same or related conditions,has become a worldwide challenge due to quality of care,health outcomes and financial concerns.Predicting frequent,preventable readmissions and understanding the factors that contribute to them is a key issue that is being extensively studied.The predicted risk and risk stratification of readmission for patients with COPD can help provide healthcare providers with some reliable data to support a more rational and balanced allocation of healthcare resources and improve the utilization of inpatient services.Patients are able to prevent early and intervene effectively through disease early warning and risk stratification,which can effectively promote health and reduce economic burdens.In response to the growing demand,this study provides a framework to combine structured and unstructured data from the medical field to develop predictive models for copd readmission and patient-specific risk classification.The specific work is as follows.Firstly,in order to handle textual data,named entities of electronic medical records are identified.We defined entity types and annotated data.A named entity recognition model based on Bi LSTM-CRF was constructed to extract valid information from electronic medical records.The recognition results of Bi LSTM-CRF were compared with CRF to verify the effect of the model.Secondly,we predicted the readmission for patients with COPD.Combining features from structured data and features obtained from textual data,prediction models were constructed using logistic regression,support vector machine,random forest,XGBoost and BP neural network.XGBoost performed best on all metrics compared to other models.The predictors of XGBoost were analyzed and the five most important predictors of readmission risk were found to be: length of stay,Charlson comorbidity index,disease course,white blood cells,and eosinophils.Finally,we stratified the risk of readmission for patients with COPD.The probabilities were stratified by applying k-means clustering on the predicted probabilities of the selected XGBoost to create five probability groups.One or two sets of data for manual evaluation are gradually discarded from the training set along the direction from low to high confidence,and the remaining samples are predicted by XGBoost model.As a result,the performance of prediction model is gradually increased.Compared with the isometric stratification method,k-means can reduce more manual evaluation effort.This risk stratification method provides a decision aid for medical decision making. |