Font Size: a A A

Research On Decision Tree Classification Based On Discrete Attribute

Posted on:2018-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhaoFull Text:PDF
GTID:2348330512477208Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is a process of discover the knowledge in a large number of existing data.In recent years,intelligent extraction knowledge has attracted widely attention in a large amount of data.It includes classification,clustering and other mining method in the field of data mining.Decision tree algorithm is simple,efficient and easy to understand in the aspect of knowledge extraction,hence,it occupies an irreplacealbe position in the field of data mining.In the existing decision tree algorithm,it is primarily based on the Shannon's information entropy to calculate the standard split decision tree node.Information entropy is repeatedly calculated log,so it will lead to the classification efficiency is too low.Because of the random of existing algorithms when select the properties,classifier can't select property further when properties division standard are equal or approximate.Consequently,it will reduce the predicted classification accuracy.In this paper,aiming at the existing disadvantages of decision tree algorithm,we made the following improvements.First,aming at the low efficiency of decision tree classification algorithm,avoiding complex log operation,improving the utilization rate of CPU,the attribute judgment standard of optimization function was proposed.Showed in compared experiments,the optimization function can effectively improve the efficiency of classification and the CPU utilization.Second,to address the problem,when two or more attributes judgment standard of calculated values are close to a certain threshold or equal,it will randomly select a node as the next attribute split node,so the accuracy of the decision tree is low.To improve the classification accuracy,a new attribute judgment method is indroduced.Through the experiment,the method can raise the accuracy in some data sets.Third,in this paper,to solve the problem of low accuracy and the overfitting of the decision tree,the method based on classification rules is introduced.With data sets random sampling,it used the improved algorithm which is G_DT algorithm to generate some classifiers.Then we select the best rules from these classifiers,producting a best classifier as the final classifier.Compared with these old decision tree algorithms,not o-nly the classification efficiency is faster,but also the classification accuracy rate is better.
Keywords/Search Tags:Data Mining, Decision Tree, Classification Rules, Information gain
PDF Full Text Request
Related items