Font Size: a A A

Classification Algorithm Of Data Mining

Posted on:2009-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:W X GuoFull Text:PDF
GTID:2178360242983095Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapidly development of humanity society and computer technology, Accumulation of electronic data has taken place at an explosive rate. Undoubtedly there must be abundant latent knowledge in these electronic data of gigantic magnitude which are very important to people and traditional data analysis tools only utilize few proportion of it. Recently continually developing technic named Data Mining just can help people find latent knowledge from data. The Classification is very important method of Data Mining. Classification method can be compared and evaluated according to the following criteria: Accuracy, Speed, Robustness, Scalability, Interpretability. Among these five criteria predictive accuracy is most important. In this paper national and international popular methods of Classification are researched and analyzed in those five aspects including classification by Decision Tree, Bayesian Classification, Classification Based on Neural Network and Classification Based on Support Vector Machine.Among these methods, Decision Tree is one of the most universal models adopted. This paper focus more on the Decision Tree, involving in the decision tree building process in all major sectors, doing a more in-depth study in the major problems of decision tree encountered on the present and future development, proposing a number of effective new ways to improve the performance of Decision Tree, making own contribution to the further application of the Decision Tree. Attribute choosing, discretization and dimension reduction, what are the common areas of Decision Tree and other data-mining methods, not only can improve the performance of Decision Tree, but also can improve other data-mining methods. So it has positive significance to the development of data-mining technology.The main research contents as follows:(1) A novel dimension reduction algorithm is proposed.(2) A weighted binary search algorithm is proposed to discrete continuous attributes. (3) An improvement in the attribute selection criterion is proposed.(4) Based on the former works, optimization and conformity is applied to the classical Decision Tree. An improvement to algorithm procedure is proposed. Comparing to the C4.5 algorithm, experiment results show the superiority.
Keywords/Search Tags:data mining, Decision Tree, discretization, dimension reduction, attribute choosing
PDF Full Text Request
Related items