Font Size: a A A

A Transfer Learning Approach To Correct Temporal Performance Drift Of Clinical Prediction Models

Posted on:2022-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y F XueFull Text:PDF
GTID:2494306734966599Subject:Computer Software Engineering
Abstract/Summary:PDF Full Text Request
Acute Kidney Injury(AKI)is a common and highly fatal clinical problem that can result in short-and long-term complications(such as chronic kidney failure,end-stage renal disease,and death).According to the patient’s susceptibility and exposure to accurately assess the risk and preventive intervention is the current effective method of treatment,but due to the limitations of current diagnostic standards,diagnosis is often delayed.However,using the big data-driven approach of Electronic Medical Records(EMR)provides a unique opportunity for AKI’s early warning.As "real world evidence",EMR has the characteristics of time series information volatility and difference,which leads to the problem of timevarying degradation of the performance of risk prediction models.How to solve the problem of model performance degradation over time will be the core challenge for predictive models to be used in clinical applications for a long time.In order to solve the problem of model performance degradation over time,this article progressively designs three experiments for in-depth research: First,explore whether the model has performance degradation.Second,analyze the correlation between model performance degradation and data distribution differences.Third,explore the effect of migration learning strategies to deal with the problem of model performance degradation over time.The main research work of this thesis is as follows:(1)Extraction and preprocessing of data.In this thesis,141,696 available samples from electronic medical records of 197,565 AKI patients are collected in the corresponding cooperative medical centers,based on relevant data extraction and preprocessing strategies.And according to the idea of Discrete-Time Survival Framework,to process(discretize)the timing information in the sample data.(2)Explore the law of predictive model performance over time.This thesis uses new data sets of different years to verify the performance of models built by five different machine learning algorithms year by year.The results show that the performance of the model begins to gradually decline over time.(3)Analyze the mechanism that affects the performance of the model over time.Based on the KS test,this thesis innovatively proposes a new KS-metric indicator to measure the difference in the distribution of the new and old data sets.It is found that the volatility and difference of the time series information in the electronic medical record has caused the problem of the performance of the risk prediction model decreasing over time.(4)This thesis proposes a new modeling framework for Transfer GBM based on the idea of transfer learning.The framework is mainly divided into three parts: First,in the feature set of historical data,use the Adapt Old Knowledge method to iteratively train the original model on the new data to retain the knowledge that is useful for model prediction in the historical data.Second,use the Learn New Knowledge method to train a predictive model on the important feature set of the new data to learn the unique knowledge in the new data.Third,integrate the above two models representing new knowledge and old knowledge through the Stacking Ensemble method.This thesis aims to study whether the new Transfer GBM modeling framework can effectively solve the problem of model performance degradation over time.The research results show that,compared with the original model and the model retrained based on new data,the prediction model constructed by the Transfer GBM framework can maintain stable and good model performance when verified year by year.
Keywords/Search Tags:Acute kidney injury, Electronic medical record, Machine learning, Risk prediction, Transfer learning
PDF Full Text Request
Related items