Study On Clinical Predictive Models Based On Limited Data

Posted on:2020-07-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Xia

Full Text:PDF

GTID:1360330605956726

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology and the popularization of hospital information system,the patients' clinical data can be electronically recorded and stored.These clinical data reflect the patients' physical conditions and provide an important basis for assessing the severity of diseases and the risk of patients.Clinical prediction models,the tools for diagnosis and prediction that are built by machine learning technology,have important value in providing scientific basis and decision support to diagnose diseases,make treatment plan and carry out medical research.However,in clinical practice,the medical data are usually small-sized and incomplete,restricting the application of clinical prediction models that may meet problems of overfitting,unacceptable prediction error and instability.To address the aforementioned problems,this dissertation studies the methods of constructing prediction models by size-limited data,aiming to improve the models'performance in clinical prediction and further assist in clinical decision-making.The main work of the dissertation are:(1)To deal with the missing medical data,an adaptive weight voting random forest algorithm(AWVRF)was developed.In case the splitting attributes of a node are missing,the algorithm allows the undergoing instance to exit at the current node with a vote,adjusts the weight of the vote by the strength of the involved attributes and makes the final decision by weighted voting.The algorithm was tested on ten UCI benchmark datasets and the experimental results demonstrate that the accuracy and AUROC of AWVRF are both superior to the current imputation-based decision algorithms(meanImpute-RF,LeoFill-RF,knnImpute-RF,BPCAfill-RF)and random forest with surrogate decision(surrRF).Compared with surrRF,AWVRF is more computational efficient while remaining good classification capabilities.(2)Aiming at the validity of prognosis model on small samples of critical diseases,a transferring long short-term memory algorithm(transLSTM)was developed.Inspired by the idea of transfer learning,the algorithm firstly builds the prototype of prognostic model on large data from relevant diseases,then transfers model structure and important parameters to the target disease model and finally completes the target model by further adjusting the network with the target disease samples.The algorithm was tested on MIMIC-III database and the results show that the transLSTM algorithm has 0.02?0.07 higher AUROC and 0.05?0.14 larger AUPRC than the traditional long short-term memory(LSTM)algorithm,while it only needs 39%-64%number of training iterations of the traditional algorithm.The application results on clinical dataset of sepsis reveal that the transLSTM model with only 100 training samples has comparable mortality prediction performance to the traditional model with 250 training samples.(3)Aiming at the modelling of complex clinical data with limited samples from multiple diseases,a long short-term memory ensembling algorithm(eLSTM)was developed.Using the ensembling framework,the algorithm firstly generates diverse subsets by employing the bootstrapped samples and random feature subspace strategy,then trains various LSTM base classifiers on these subsets.When forecasting,the algorithm obtains the comprehensive prediction result by merging the decisions from all LSTM base classifiers.The algorithm was tested on MIMIC-III database and the results show that compared with clinical scoring systems(SAPS II,SOFA and APACHE ?),random forests and the single LSTM classifier,the eLSTM model has the best prognosis performance with the largest AUROC value of 0.8451 and the largest AUPRC of 0.4862.In the above work,the innovations of the study are summaried as:(1)Proposing an adaptive weight voting random forest algorithm AWRF which conducts classification directly on incomplete datasets.The technique addresses the failure problem of random forest model when data are partialy missing.The algorithm has outstanding classification ability and computational efficiency on incomplete data set.(2)Proposing a transferring long short-term memory algorithm transLSTM which is suitable for small samples of specific critical disease.Based on the idea of transfer learning,the technique leverages large data of relevant diseases to address the problem of big prediction error in small clinic samples.The algorithm has advantages of high prediction accuracy and fast training speed.(3)Proposing a long short-term memory ensembling algorithm eLSTM which fits the complex clinical environment of multiple diseases.The technique solves the modeling problem of small-sized ICU data from multiple diseases and complications by adopting ensembling framework.The algorithm is able to produce accurate assessment of the severity of patients dynamically and provides a potential tool for establishing a unified model of all diseases.In conclusion,the proposed clinical prediction models are attuned to the clinical situation with limited data,can accurately assess patients' condition and provide information for medical staff to manage and treat patients.The present study deepens the application of machine learning technology in medicine and is helpful to promote the development and perfection of clinical decision support system.

Keywords/Search Tags:

clinical prediction model, missing data, small samples, transfer learning, ensembling framework

PDF Full Text Request

Related items

1	Missing Data Processing Method And Its Application In Clinical Trials
2	Research On Terahertz Spectral Identification With Small Samples
3	Variable Selection For Transformation Models Based On Quantile Regression With Missing And Censored Data
4	Comparative Study Of Different Methods In Dealing With Missing Data In Clinical Trials
5	Modelling Of Near-infrared Spectroscopy Based On Semi-supervised Learning And Transfer Learning
6	Research On Small Sample Biomedical Data Analysis Based On Deep Learning
7	Improved generalized estimating equations for incomplete longitudinal binary data, covariance estimation in small samples, and ordinal data
8	A Probability Based Framework for Testing the Missing Data Mechanism
9	Methodological and clinical issues in analysis of data from HIV cardiovascular research: Validity of ultrasound methods, impact of anti-retroviral therapy on atherosclerosis, and imputation of missing values
10	Improved Algorithms Based On Extreme Learning Machine For Handing With Missing Data And Application