Font Size: a A A

Accurate Modeling Of Cardiovascular Chronic Disease Risk Based On Medical Big Data

Posted on:2022-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J YangFull Text:PDF
GTID:1484306494486634Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cardiovascular disease is the main cause of death and disability worldwide,and China has become the country with the largest number of deaths from cardiovascular disease in the world.According to the statistics of the National Cardiovascular Center,in 2019,there were about 330 million cardiovascular disease patients in China,and two out of every five deaths were due to cardiovascular disease.Cardiovascular disease is often the result of the synergistic effect of multiple risk factors,and early detection and treatment are the key to preventing and controlling cardiovascular disease.Disease risk assessment is an effective means for early detection of high-risk groups of cardiovascular disease.Several cardiovascular disease risk assessment scales have been released,but their applicability in other areas has been questioned.The establishment of traditional cardiovascular disease risk assessment scales is mostly based on prospective studies,but it is time-consuming and labor-intensive.With the continuous advancement of medical information construction,a large amount of medical data has been accumulated,which provides data support for mining potential risk factors.The application of big data technology in medicine and clinic has attracted wide attention.However,the lack of medical data quality and data standards restricts the development and application of medical big data.Therefore,starting from the historical stock data of regional health information platform,this thesis studies the processing and analysis technology of clinical electronic medical record data,as well as the cardiovascular disease risk modeling method based on electronic medical record data.The main research contents are as follows:First of all,aiming at the problems of high noise,data missing and unstructured text existing in clinical electronic medical record data,medical data processing and analysis techniques are studied to construct high-quality electronic medical record data sets,which lays a data foundation for establishing high-precision cardiovascular disease risk prediction model.Aiming at the problems of irregular clinical terminology and high noise in Chinese diagnostic texts,an automatic coding method for disease diagnosis based on similarity calculation is proposed,and the coverage rate is over 80%;aiming at the problem of a large number of missing diagnosis in medical image examinations,a medical text classification method based on convolutional neural network is proposed,the AUC value of the two-classification experiment was up to0.998,which realizes the automatic filling of disease types;aiming at the problems of mixed use of drug names and non-standard laboratory test results,the drug dictionary and test item matching rules are established,and the drug classification of 16 cardiovascular diseases and the structure of 50 million inspection records are realized.In the end,two high-quality electronic medical record data sets are established,which include 250,000 registered hypertension patients and 90,000 registered diabetes patients respectively.Secondly,aiming at the problem of insufficient accuracy of cardiovascular disease risk assessment scales,disease risk modeling methods based on clinical electronic medical record data are studied to explore important risk factors and their complex relationship with the occurrence of cardiovascular events.This thesis starts from the registered hypertensive patients with high data integrity,coronary heart disease with the largest number of patients is selected for study.Based on the historical electronic medical record data,multi-temporal trend characteristics are constructed,and the traditional risk factors are combined to establish a three-year coronary heart disease risk prediction model,with an AUC value of 0.943.Compared with the traditional risk assessment scales on the independent validation set,the prediction performance of the established model is better,and the AUC value is improved by 0.15.This study finds that the trend characteristics of physiological parameters have a significant impact on the prediction performance of the model,and analyzes its nonlinear correlation with the occurrence of coronary heart disease events.Thirdly,centering on the risk modeling of cardiovascular disease and the discovery of potential risk factors,in this thesis,stroke in cerebrovascular disease is selected for further study.In addition to traditional risk factors,some characteristics of clinical significance(such as pulse pressure difference,blood pressure variability,etc.)are constructed,and a three-year stroke prediction model is established for hypertensive patients,with an AUC value of 0.922.Compared with the traditional risk assessment scales on the independent validation set,the prediction performance of the established model is better,and the AUC value is improv by 0.17.The results of the analysis confirm the nonlinear effect of blood pressure trend characteristics on cardiovascular events again,which will provide guidance for scientific management of hypertension patients.Fourth,to address the problem that existing renal failure risk prediction models rely on the early diagnosis of chronic kidney disease with low awareness,renal failure risk prediction model for the general population is studied.Renal failure increases the risk of death from cardiovascular disease,which accounts for approximately 50% of deaths in patients with renal failure.Starting from the registered patients with hypertension and diabetes,combining the multimodal data of laboratory tests,physiological monitoring,and clinical diagnosis,this thesis establishes a three-year renal failure risk prediction model with an AUC value of 0.914,and extends the renal failure risk prediction to chronic patients.Some novel biomarkers of renal failure(such as alanine aminotransferase,aspartate aminotransferase,etc.)have been developed,which can be applied to the early screening of renal disease in a wider range,and their nonlinear effects on renal failure have been analyzed.Finally,the work of this thesis is summarized and the future research direction is clarified.
Keywords/Search Tags:Medical Big Data, Cardiovascular Disease, Risk Prediction, Machine Learning
PDF Full Text Request
Related items