Font Size: a A A

Improvement And Research Of C4.5 Algorithm Based On K-means

Posted on:2020-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y K XiFull Text:PDF
GTID:2428330578456463Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the value of data has become increasingly prominent,the data accumulated in various industries over the years has great potential for mining.Therefore,data mining technology is developing rapidly,and every accurate data analysis result can bring huge industry benefits.In order to get the results of data analysis more quickly and accurately,data mining algorithm has become our focus.When faced with a large number of multidimensional continuous attribute values in the traditional C4.5 algorithm,the traditional discretization method is easy to cause the problem of low classification accuracy and low operation efficiency of the algorithm,two continuous discretization methods of attribute values are proposed in this paper.The first is the ten equal separation and dispersion method.After sorting the continuous attribute values,the value at ten equal points are taken as the candidate splitting-points for calculation.The other way is to discretize continuous attribute data by K-means algorithm,firstly generating a data cluster by combining continuous data without signatures with corresponding class labels,several clusters are generated through the K-means algorithm,calculating the information gain rate by taking the approximate boundary points of the cluster as the candidate classification nodes of the continuous attribute.Experimental results show that compared with the traditional C4.5 algorithm,the ten equal discretization method made C4.5 algorithm has higher execution efficiency,the discretization mode based on K-means algorithm enables the C4.5 decision tree model to have higher classification accuracy.
Keywords/Search Tags:C4.5 algorithm, K-means algorithm, Continuous attribute, Discretization
PDF Full Text Request
Related items