Font Size: a A A

Disease Risk Prediction Based On Incomplete Medical Data

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:H N ZhaoFull Text:PDF
GTID:2404330596470884Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
After a decade of large-scale construction of medical digitization system,a large number of electronic health records have been accumulated in China,providing a rich data foundation for disease risk prediction.However,there exits serious data missing in these medical data.The data values or attributes are missing due to subjective and objective factors in collection or preservation process,resulting in a series of information and knowledge loss.Currently,there is not much effective way for learning these quantified and structured medical data.At the same time,most of the deep learning models used in disease risk prediction do not provide operational mechanisms and explanations within the model.While disease diagnosis and clinical decision making require models providing adequate and reasonable evidence.In addition,traditional medical diagnosis,to a large extent,relies on the doctor’s experience and domain knowledge.In order to allow doctor to quickly accumulate experience and supplement knowledge in interacting process,the model must be able to give a basis for judgment.Therefore,further research is needed on how to combine the excellent performance of deep model and enhance the interpretation of prediction model.In view of the problems above,this paper proposes a mimic learning method suitable for disease risk prediction,in which spectral regularization method is used to learn and estimate incomplete medical data.The spectral regularization method excels in utilizing the structure of problem and data themselves,and is capable of handling large scale matrices.The imputed data minimizes the cumulative error generated for subsequent algorithms and can be used to efficiently obtain high quality data in the absence of domain-specific knowledge or domain expert labeling.On this basis,deep forest utilizes multi-granularity scanning and cascade structure for disease risk prediction,which obtains interpretable classification results and has the advantages of less parameters and insensitivity to parameter setting.This paper evaluates the effectiveness of the proposed method based on a public dataset(amyotrophic lateral sclerosis dataset)and a private dataset(thyroid cancer data).Among them,the amyotrophic lateral sclerosis data set includes information on 6,842 ALS patients participating in clinical trials,and the thyroid cancer dataset includes more than 11,745 thyroid cancer patients and 216 patients attribute information.In the experiment,the proposed method is superior to other comparison methods and obtains relatively effective and stable classification prediction results.
Keywords/Search Tags:electronic health records, spectral regularization, deep forest, mimic learning, disease risk prediction
PDF Full Text Request
Related items