Font Size: a A A

Comparative Study On The Application Of Classification Models For Predicting The Prognosis Of Acute Coronary Syndrome Patients

Posted on:2019-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:1364330542997286Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
[Objective] There are numerous patients with cardiovascular disease in our country.However,clinical data of cardiovascular disease is uneven in data quality.The utilization and mining of cardiovascular data are insufficient in our country when compared to the advanced world levels.Consequently,the research ‘the construction and application of data platform for cardiovascular disease' is crucial for the study of cardiovascular disease.Acute coronary syndrome(ACS)was a severe type of coronary artery disease.Patients with ACS may have distinct prognosis,due to different states of illness and various risk factors.A few ACS patients may occur major adverse cardiac and cerebrovascular events(MACCE),including death.Therefore,the prediction of ACS patients prognosis become a significant topic in the research.The incidence of MACCE is small in patients with ACS.Consequently,the prediction of ACS patients prognosis belongs to the topic of classification of imbalanced datasets.Modeling is difficult for imbalanced classification.A few researchers have built numerous score models to predict the prognosis of patients with ACS based on logistic regression or Cox regression.However,these models are not ideal enough to evaluate the risk of MACCE in ACS patients.Consequently,our thesis aim to use numerous classification models in statistics and machine learning to predict the prognosis of ACS patients.Then we may find out optimal classification models for the classicification of imbalanced datasets.Additionally,we attempted to construct novel models by combining the methods of statistics and machine learning.These models may complement and develop the theory of classification.[Content and Methods] Our study based on DESIRE-2 database.We made inclusion criterion and exclusion criterion to select observations suitable for modeling.According to medical specialists suggestions,we eliminated a few variables before modeling to ensure the robustness of models.In order to avoid introducing redundant information into models,we applied stepwise discriminant analysis and stepwise logistic regression to select valuable variables from the initial set of factors.Combining the suggestions of medical specialists,we determined the variables applied to classification models in statistics and machine learning.We applied the entire variables in the initial set of factors to build decision tree and ensemble learning models based on decision tree.The prediction of ACS patients prognosis was essentially to build a classifier based on imbalanced datasets.We may use resampling method to ensure that the sample size is balanced between patients who occurred MACCE and those who did not.Then we utilize new datasets to build classifiers by classification models in statistics and machine learning.Classical classification models included distance discriminant analysis,Fisher discriminant analysis and Maximum-likelihood discriminant analysis.Bayes classification models included Bayes discriminant analysis and Bayes formula discriminant analysis.Classification models in machine learning included decision tree,artificial neural network,support vector machine and ensemble learning.Additionally,we attempted to build Bagging-Bayes and Adaboost-Bayes models by combining statistical classification models and ensemble learning methods.We illustrated the theory of two models and put them into practice.We used k-fold cross validation method to evaluate the generalization performance of the entire models.Sensitivity,Specificity,positive predictive value,accuracy,G-mean,F-measure and ROC characteristics were adapted to evaluated the performance of models.Therefore,we may find out the optimal models for classifying imbalanced datasets.[Results] According to medical specialists suggestions,we determined the initial set of factors,which contained 19 quantitative variables and 28 qualitative variables.8 quantitative variables and 12 qualitative variables were selected by statistical methods and professional advice.Quantitative variables included age,left ventricular ejection fraction,hemoglobin,white blood cell,neutrophil ratio,serum creatinine,total cholesterol and low density lipoprotein.Qualitative variables included COPD history,GFR rank,ACS type,diabetes mellitus,hypertension,atrial fibrillation and atrial flutter,old myocardial infarction,vessel involvement,revascularization strategy,complete revascularization,and use of ACEI and ARB.These variables were introduced into classical classification models,Bayesian classification models and part machine learning models.For decision tree and ensemble learning models based on decision tree,the entire variables in the initial set of factors were selected to build models.Our research indicated that for this imbalanced classification problem,the performance of classical classification models and Bayesian classification models were not ideal.Sensitivity of these models ranged from 0.483 to 0.559,positive predictive value ranged from 0.564 to 0.576,G-mean ranged from 0.553 to 0.560,F-measure ranged from 0.523 to 0.561.Additionally,the area under ROC ranged from 0.562 to 0.593 for these models.That suggested that these models had low ability in the prediction of ACS patients prognosis.In machine learning techniques,the performance of simple decision tree,medium decision tree,BP artificial neural network,linear support vector machine and adaboostDT were also not ideal.Sensitivity of these models ranged from 0.368 to 0.626,positive predictive value ranged from 0.564 to 0.610,G-mean ranged from 0.520 to 0.602,Fmeasure ranged from 0.450 to 0.610.Additionally,the area under ROC ranged from 0.556 to 0.640 for these models.That suggested that these models had low ability in the prediction of ACS patients prognosis.The area under ROC was 0.768 for complex decision tree model,that suggested that the model had medium ability in the prediction of ACS patients prognosis.For Gaussian support vector machine and Bagging-DT models,the areas under ROC were 0.992 and 0.988,respectively.That suggested that two models had high ability in the prediction of ACS patients prognosis.We attempted to build Bagging-Bayes and Adaboost-Bayes models by combining Bayesian classification models and ensemble learning methods.The research showed that sensitivity of two models were 0.564 and 0.462,positive predictive value were 0.566 and 0.591,G-mean were 0.560 and 0.557,F-measure were 0.565 and 0.518,respectively.The performance of two models appeared to be not ideal.Additionally,the area under ROC were 0.593 and 0.562 for two models,indicating that two models had low ability in predicting the prognosis of patients with ACS.[Conclusion] The prediction of ACS patients prognosis was essentially to build a classifier based on imbalanced datasets.For this question,the performance of classical classification models,Bayesian classification models,simple decision tree,medium decision tree,BP artificial neural network,linear support vector machine,adaboost-DT,Bagging-Bayes and Adaboost-Bayes models were not ideal.They had low ability in the prediction of ACS patients prognosis.The performance of complex tree was medium.Complex tree had medium ability in the prediction of ACS patients prognosis.The performance of Gaussian support vector machine and Bagging-DT models were superior.Two models had advanced ability in predicting the prognosis of patients with ACS.The thesis facilitated the theory study of classification of imbalanced datasets.Furthermore,the study complemented and developed the theory of classification.
Keywords/Search Tags:Acute coronary syndrome, imbalanced datasets, classification model, statistical modeling, machine learning
PDF Full Text Request
Related items