Stroke is the first cause of death and disability among Chinese residents,and it has the characteristics of high recurrence rate and high mortality rate.In the past 30 years,the incidence of stroke in my country has continued to increase.Disease prognosis research is that experts understand patient information and conduct targeted interventions to reduce the probability of disease recurrence.Existing studies on stroke prognosis prediction are based on offline training prediction models,but the data in the real world is constantly arriving,and the distribution of newly arrived data usually deviates from the training data,resulting in a significant drop in the model’s quasi-call rate.Therefore,in the application scenario of real-time prediction,it is necessary to continuously update the model online to maintain the model performance at a high level.In addition,while people are pursuing the improvement of the quasi-call rate of machine learning,the original features are mapped to high-dimensional space or hidden layers,which sacrifices the interpretability of the model to a certain extent,so the application in the medical field is limited.In this regard,on the basis of existing research,this paper combines online learning and interpretability to construct a stroke prognosis prediction and interpretation model.The main research contents of this paper are as follows:(1)Stroke data preprocessing and stream data analysis.Since real-world collected data usually has the problem of "dirty data",putting it directly into model training does not work well.Therefore,data preprocessing is performed on the problems of data imbalance,missing values,and dimension differences in the original data set.At the same time,in order to simulate the real scene after the model goes online,this paper uses the IST data of the public dataset as the training data collected offline,and the IST-3 data as the streaming data generated online.The training model uses IST data.After the model training is completed,the experiment simulates the way of streaming data,input IST-3 data one by one in sequence,record the performance of the offline model on the IST-3 data,so as to analyze the impact of the streaming data on the model performance,and Choose an appropriate base model to prepare for follow-up research.(2)In view of the problem that the performance of the real-time application scenario model decreases over time,an online learning method is introduced.At present,there are many real-time medical platforms and online prediction tools in the academic and medical fields,but the problem of performance degradation after the model goes online is usually ignored.The common solution is to update the model regularly,but this method lacks timeliness on the one hand.,on the other hand not sure when an update is required.Most of the existing transfer learning algorithms are suitable for static data streams,and cannot adjust the classification model in time to adapt to the new data space when concept drift occurs.In response to this problem,this paper proposes an online transfer learning method,which detects concept drift in real time,triggers transfer learning when the deviation threshold is exceeded,and adapts the classifier to the new target domain based on the CORAL transfer learning algorithm,thereby improving the accuracy.(3)The disease prediction research based on machine learning generally lacks interpretability,and the SHAP method is introduced to solve the global explanation and local explanation.Specifically,the interpretation of the model can be divided into global interpretation and local interpretation.Globally explain the importance of common features and analyze the influence of each feature on the model.Local interpretation can input a certain sample into the model and get the output,and the contribution value of each input index to the output result. |