Font Size: a A A

Application Of Ensemble Decision Tree De Based On Improved Data Protocol In Medical Decision-Making

Posted on:2019-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z XuFull Text:PDF
GTID:2428330548473580Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
Medical data mining is one of the important research fields of data mining technology and has always been a research hotspot both in computer science and medicine.At present,many excellent data mining algorithms are used in various medical researches,such as decision trees,neural networks,and naive Bayes.However,medical data has characteristics such as feature redundancy,and unbalanced data sample categories,which result in that traditional data mining algorithms are difficult to directly apply to medical data research.Integrated learning has been widely used in medical data mining,since it has the advantages of good classification performance,rapid model construction,and the ability to improve the classification performance of unstable algorithms such as decision trees.Thus,the UCI standard data set and ECG clinical diagnosis data set are regarded as the research object in this paper.The feature selection method and data resampling method based on Bagging C4.5 algorithm is researched.First,the S-C4.5-SMOTE sampling method is proposed for the imbalance of sample categories of medical data.Although the traditional SMOTE method can effectively reduce the imbalance of the sample,the number of samples of a few classes and the work efficiency of subsequent learning algorithms are increased.However,the proposed method can reduces the sample imbalance and the number of samples on the premise that the data is not distorted.The experimental results on the UCI dataset show that the performance of method is significantly better than the SMOTE method.The data sampled by S-C4.5-SMOTE method not only reduces the number of samples,but also has better classification accuracy.Secondly,the Wrapper feature selection method is proposed for the feature redundancy of medical data.The Wrapper method utilizes the classification performance evaluation feature subset of the subsequent learning algorithm,which has a small deviation and a large amount of calculation and is not suitable for large data sets.Therefore,the S-C4.5-SMOTE sampling method is used to reduce the number of samples of data before Wrapper method,which is to increase the work efficiency of the Wrapper method.Experiments on UCI datasets show that it has higher efficiency to use the S-C4.5-SMOTE method before the Wrapper method.Finally,aiming at the instability of C4.5 algorithm,a Bagging integration method is proposed to integrate the C4.5 algorithm.The same problem is solved by constructing multiple decision tree-based classifiers.The experimental results on the UCI dataset show that the integration of the C4.5 algorithm can effectively improve the classification accuracy and solve the instability of the C4.5 algorithm.Applying the method proposed in this paper to ECG data,the classification accuracy of this method is as high as 90.1%,and ECG data classification prediction is successfully implemented.
Keywords/Search Tags:Ensemble learning, Decision tree, Feature selection, Data resampling, Medical data mining
PDF Full Text Request
Related items