Font Size: a A A

Research On Disease Prediction Based On Integrated Learning ——Case Study:Data Analysis Of Cardiovascular And Cerebrovascular Diseases

Posted on:2022-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Q HuFull Text:PDF
GTID:2504306764991619Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence information technology,internet technology and medical technology are steadily integrated,and various statistical methods and machine learning algorithms are widely used in disease predictions.Cardiovascular disease is a kind of disease that seriously endangers human health worldwide.It is important to establish effective prediction models for cardiovascular diseases to control the disease risk and provide the protection of physical and mental health for the population.Existing research efforts have been typically made on investigating cardiovascular disease prediction models using conventional statistical methods or traditional machine learning algorithms,but the predictive performance of a single classifier in machine often has limitations.Therefore,this thesis takes the integrated model as the base classifier,and conducts Stacking fusion and Voting fusion respectively to establish the cardiovascular disease prediction model,and then realize the warning and disease intervention for high-risk groups.The thesis firstly reviews the theories related to data mining classification techniques and classifier evaluation metrics.Using the UCI myocardial infarction dataset as the research object,we use relevant data pre-processing techniques for data cleaning,as well as the filtering and embedding methods for feature selection;for the problem of data imbalance,SMOTE,Boderline-Smote and ADASYN are adopted for processing,SMOTE method was selected after comparison and analysis,and finally 3082 24-dimensional data were derived for subsequent modeling preparation.For improving the model accuracy,the sample data are divided into training and test sets in a 7:3 ratio,and classical logistic regression,random forest,XGBoost and Light GBM are used to establish a prediction model for the recurrence of myocardial infarction and compared,respectively.Then the above three integrated models were fused using Stacking and Voting algorithms,and the prediction effects of the models are compared and analyzed in terms of four evaluation metrics: AUC,Recall,F1 value and Accuracy.The results show that the prediction performance of the Stacking and Voting fusion models with the three integrated algorithms of Random Forest,Xgboost and Light GBM as the base classifier is better than other single models,and the overall effect of Voting fusion is stronger than Stacking fusion,in both F1 and Recall,and the AUC value is even above 0.98.This indicates that the fusion model established in this thesis effectively improves the overall classification accuracy and stability of the model,providing a new idea for cardiovascular disease modeling and analysis.
Keywords/Search Tags:Cardiovascular disease, Integrated learning, Model fusion, Classification prediction, Myocardial infarction
PDF Full Text Request
Related items