Font Size: a A A

Prediction Of Hospitalization Costs For Cerebral Infarction In Shanghai Based On Random Forest

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z R ZhaoFull Text:PDF
GTID:2514306302472624Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The control of medical expenses,especially the payment from medical insurance institutions to medical service institutions,is one of the important links in the risk control of social medical insurance funds.At present,cerebrovascular disease has become the first cause of disability and death in China.With the rapid development of population aging and the rapid development of economic level and lifestyle,the incidence rate of cerebral infarction has increased significantly,which has brought heavy financial burden to individuals,families and society.Therefore,the control of the cost of hospitalization for cerebral infarction has been paid more and more attention by many scholars from all walks of life.Based on statistics and machine learning,scholars have proposed a variety of models to predict the cost of hospitalization for cerebral infarction,among which multiple regression model and decision tree model are the most studied.In this regard,they proposed some variables and modeling methods that have significant impact on the cost of hospitalization for cerebral infarction.However,due to the imperfect information and the inadequate implementation of the model,the prediction of the cost of hospitalization for cerebral infarction always stays in a single correlation,and its accuracy and stability do not reach the expected level of development.The application of statistics,machine learning and integrated learning technology in medical expenses is constantly advancing.How to effectively integrate them has become a major issue in the era of big data.At present,the results of the study on the inpatient cost of cerebral infarction and the empirical analysis in this paper show that the simple multiple regression research on the correlation between variables and independent variables can not adapt to the complex relationship between dependent variables and independent variables.Therefore,this paper introduces machine learning model to describe its nonlinear relationship.The results of empirical analysis show that the regression tree commonly used for cost prediction belongs to weak learning machine,and the prediction accuracy of hospitalization cost of cerebral infarction is not greatly improved compared with multiple regression.If further improve the accuracy and stability of cost prediction is wanted,learningmechanism and efficiency needs improvement.Therefore,this paper introduces the integrated learning random forest algorithm technology to predict the cost.At the same time,different cases have different characteristics,so this paper learns the characteristics of patients,makes accurate judgment on each patient's characteristics,and creates effective speculation based on characteristics.Finally,this paper proposes an improved random forest model and introduces k-random forest model based on clustering and cart regression tree.Firstly,this paper introduces the significance of the research on the hospitalization expenses of cerebral infarction and the current research situation at home and abroad,analyzes the variables that have significant influence on the hospitalization expenses of cerebral infarction proposed by scholars,as well as various models and methods for predicting the hospitalization expenses of cerebral infarction,and compares the advantages and disadvantages of these methods.Then,this paper introduces the decision tree algorithm and K-Means clustering algorithm in machine learning and the bagging school and random forest algorithm in ensemble learning from its principle,function and pseudo code.Next,this paper introduces the improved random forest algorithm,explains the introduction of K-Means clustering into random forest to generate a new k-random forest algorithm,the principle,process and pseudo code of the algorithm,and how to apply the algorithm to the prediction of hospitalization expenses of cerebral infarction.Next,use the real hospital inpatient record data to clean,mine and explore the data.After getting the available data set,this paper carries out feature engineering,including building features,initial feature screening and feature dimension's reduction,and finally gets20 available features.Through exploratory analysis of these characteristics,five kinds of cost categories are found: rehabilitation treatment cost,physical treatment cost,traditional Chinese medicine treatment cost,traditional Chinese medicine orthopedic treatment cost and other costs.After the total cost is divided,the model of each cost is established to predict,the optimal prediction model of each cost is selected,and the final model is obtained by integrating the five optimal models.The evaluation index of the final fusion model is calculated,and compared with the optimal model which directly forecasts the total cost.It is found that the fusion model has improved in R2 and MAPE,and achieves the effect of R 2 and MAPE.Further,it analyzes the characteristics of significant impact on different categories of expenses and their business significance.Finally,it is concluded that the final cost prediction model is a stable and accurate model for the prediction of the inpatient cost of cerebral infarction.At the same time,the variables that have significant influence on the results and their business significance are obtained,and suggestions and prospects are put forward.The model is not only conducive to the transformation of medical insurance payment mode and the control of hospitalization medical expenses in the future,but also can make the medical cost more reasonable,so as to ensure the quality of medical treatment,improve the competitiveness of hospitals,and reduce the burden of social medical insurance expenditure,which is of great significance for social development and economic stability.
Keywords/Search Tags:Cerebral Infarction Hospitalization Cost, Integrated Learning, Random Forest, Generalized Linear Models, K-Means Cluster
PDF Full Text Request
Related items