Font Size: a A A

The Study Of Decision Tree Classification Algorithm In Data Mining

Posted on:2016-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2308330461994503Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Data mining as an emerging field application is very broad, there are many issues worthy of further study. Classification, as an important part of data mining is one of research focus. The decision tree classification algorithm because of its high efficiency, simple structure, easy to be understood and high classification accuracy etc widely used by people.In the paper, Based on the study and analysis on the basis of the existing data mining technology, focused on the decision tree C4.5 algorithm in classification. Main content: an overview of data mining technology, classification and decision tree technology, detailed introduce C4.5 algorithm, improved C4.5 algorithm and its application to the instance.The innovation of this article is to improve the C4.5 algorithm, And applied to serve as assistant decision-making in the practical application to commercial Banks. The main idea: In accordance with the need of C4.5 algorithm in the process of running multiple scan, result in the defects to improve the efficiency is low. A total of extract two improvement ways were summarized: One is for the Category attributes only positive cases and counter example sets two categories of special cases, combined with the Taylor formula in higher mathematics and the characteristics of the information gain rate calculation, discriminant ability of attribute measurement calculation improvement, optimization of the logarithmic; Another is On the processing of continuous attributes is improved, for the continuous attributes of existing C4.5 algorithm, is to sort through after discretization, compare all the dividing point of information gain rate, choose the test attribute, improved algorithm is proposed to find the best dividing point is the choice of boundary point of optimal division point, improve the operation efficiency of the algorithm. Using 10 common UCI data set for C4.5 algorithm and improved algorithm to make MATLAB simulation experiment, to the conclusion: Improved algorithm effectively improves the algorithm performance, save a space, at the same time will not affect the generation of decision tree, and testing accuracy.Finally, using the improved decision tree C4.5 algorithm on the Internet to a German bank personal credit information data modeling. From the experimental results, the models have good stability, high efficiency, high prediction accuracy and takes up less space, conform to the requirements of the modeling, rational and feasible.
Keywords/Search Tags:Data Mining, C4.5 Algorithm, the Improved Algorithm, Application
PDF Full Text Request
Related items