Font Size: a A A

Application And Reasearch Of Data Mining Based On C4.5 Algorithm

Posted on:2009-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y P YunFull Text:PDF
GTID:2178360245986480Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the last decades, the database technologies and magnanimity memory technologies have developed much. At information age, oriented to a great deal of data, how to utilize the huge original data to analysis the current situation and predictthe future effectively, have already become a great challenge that the mankind has faced. Therefore the data mining technology is arised at the historic moment and canbe developed rapidly. Recently, data mining has been one of hot research area. The knowledge discovered by data mining technologies can be used to offer decisionsupport.The typeⅡDiabetes Mellitus cases are increased rapidly recently in the world. The growing of age increases the incubation period and illness rate of typeⅡDiabetes Mellitus cases, which indicates that this disease is a kind of progressive Diabetes Mellitus. In response to this disease in the rules of development, we attempts to use data mining algorithms to discover rules, we introduce Data Mining in the pathogeny research of typeⅡDiabetes Mellitus, in order to obtain the knowledge of type 2 Diabetes Mellitus pathogeny, to discover the required data and rules, and then to structure the classification and prediction system of Diabetes Mellitus.The source data of Diabetes Mellitus originates from the health examination reports on patients and random sampling. With the appropriate transforming the data in the health examination reports and storing the data in the database, we can get the source data. For the incompleteness, noisiness and inconsistency in these data, we use some preprocess technologies of Data Mining, such as Data Cleaning, Data Transformation and Data Reduction, to process these source data.The Data Mining task is to find the illness regularity from huge Diabetes Mellitus data, to organize the decision system to prevent, diagnose and predict the Diabetes Mellitus. Depending on the mining-mission classification and the mining algorithm requirement, we choose Decision Tree method to do data mining. Also, for the continuity in Diabetes Mellitus data, choosing the C4.5 algorithm in Decision Tree method to be the data mining algorithm.On the basis of the implement of C4.5 algorithm, we learn the knowledge of illness regularity and rules from Diabetes Mellitus data, and generate a set of rules of Diabetes Mellitus diagnostics and prediction depending on the preprocessed Diabetes Mellitus data. In addition, with the holdout method in classification to determine the accurate rate.Because the accurate rate of illness group derived from the Decision Tree above is not precise enough, we bring up the ratio of the training set as the parameters to test the accurate rate of illness group and test the variation associated with the average accurate rate varied from the ratio. From this method, we can provide an improved classifier and a best solution to determine the illness group.
Keywords/Search Tags:data mining, decision tree, C4.5 algorithm, diabetes mellitus
PDF Full Text Request
Related items