Research On Decision Tree Classification Based On Discrete Attribute

Posted on:2018-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:K Zhao

Full Text:PDF

GTID:2348330512477208

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data mining is a process of discover the knowledge in a large number of existing data.In recent years,intelligent extraction knowledge has attracted widely attention in a large amount of data.It includes classification,clustering and other mining method in the field of data mining.Decision tree algorithm is simple,efficient and easy to understand in the aspect of knowledge extraction,hence,it occupies an irreplacealbe position in the field of data mining.In the existing decision tree algorithm,it is primarily based on the Shannon's information entropy to calculate the standard split decision tree node.Information entropy is repeatedly calculated log,so it will lead to the classification efficiency is too low.Because of the random of existing algorithms when select the properties,classifier can't select property further when properties division standard are equal or approximate.Consequently,it will reduce the predicted classification accuracy.In this paper,aiming at the existing disadvantages of decision tree algorithm,we made the following improvements.First,aming at the low efficiency of decision tree classification algorithm,avoiding complex log operation,improving the utilization rate of CPU,the attribute judgment standard of optimization function was proposed.Showed in compared experiments,the optimization function can effectively improve the efficiency of classification and the CPU utilization.Second,to address the problem,when two or more attributes judgment standard of calculated values are close to a certain threshold or equal,it will randomly select a node as the next attribute split node,so the accuracy of the decision tree is low.To improve the classification accuracy,a new attribute judgment method is indroduced.Through the experiment,the method can raise the accuracy in some data sets.Third,in this paper,to solve the problem of low accuracy and the overfitting of the decision tree,the method based on classification rules is introduced.With data sets random sampling,it used the improved algorithm which is G_DT algorithm to generate some classifiers.Then we select the best rules from these classifiers,producting a best classifier as the final classifier.Compared with these old decision tree algorithms,not o-nly the classification efficiency is faster,but also the classification accuracy rate is better.

Keywords/Search Tags:

Data Mining, Decision Tree, Classification Rules, Information gain

PDF Full Text Request

Related items

1	Decision Tree Methods In Data Mining And Customer Classification
2	Inductive Decision Tree Classification Model In The Military Transport Vehicle Management System
3	The Research On The Algorithms Of Optimizing Decision Tree Classification
4	Research And Application Of Classification Algorithm Based On Decision Tree Rules
5	Based On Decision Tree Classification Method
6	The Application Research Of Data Mining In GUILIN Tourism Information
7	The Study And Application Of Media Item Communications Analysis With Association Rules And Decision Tree Algorithm
8	Research And Application On Decision Tree In Data Mining
9	The Application Of Decision Tree In Vocational Colleges Employment
10	Decision Tree Classification Method And Its Railway Ticket Marketing Analysis