Font Size: a A A

Information-gain Based Quantization Algorithm And Its Application In Decision Tree Study

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:B B DengFull Text:PDF
GTID:2308330485469591Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Data mining refers to the massive data intelligently extract potentially valuable patterns or rules knowledge such as a series of complex process. In this era of big data, data mining technology plays an important role in the various industries, as a result, more and more valued. Classification of data mining is an important method of data analysis. At present, the technology of classification has been widely used in many fields, such as education, finance, medical treatment, etc. There are many methods of classification, such as decision tree, KNN, SVN, genetic algorithm and so on. Among them, the decision tree classification algorithm to the advantages of simple and easy to understand the theory, and has been widely studied and applied, and the decision tree classification algorithm can only be applied on the discrete data. Therefore, the continuous data can only be quantized processing first, then use the decision tree classification algorithm, in order to improve the accuracy of the decision tree classification algorithm.This thesis mainly studies on the quantitative algorithm for continuous attribute data and quantified the influence on the accuracy of the decision tree algorithm. The main contributions of this thesis are as follows:Firstly, according to the Chinese and English literature on the decision tree classification algorithm, clustering algorithm research and comparison of decision tree classification algorithm, clustering algorithm (see Chapter 1).Secondly, introduce the basic concepts of decision tree classification algorithm and clustering algorithm (see Chapter 2).Thirdly, the quantization algorithm and clustering algorithm based on information gain and combination, collectively referred to as the quantization algorithm based on information (see Chapter 3).Fourthly, on the basis of the experimental results and the quantized data are used in the decision tree, comparing the accuracy performance of the decision tree classification accuracy and analysis these quantitative algorithms of their advantages. Discretization ofcontinuous data set using a quantization algorithm; secondly, the discretization of data used in the decision tree algorithm and record classification accuracy; then contrast analysis of decision tree classification accuracy observed improved classification accuracy degree is improved, and the classification accuracy as a measure of the standard, select the classification accuracy of maximum corresponding to the quantization algorithm as the data set best quantization method (see Chapter 4).Finally, in the eclipse platform experiments were conducted using java language to realize the quantization algorithm, the data set stored in a MySQL database, data were collected from the UCI data set, data quantization get the quantized data sets, reuse data mining WEKA platform of decision tree classification algorithms C4.5 method source code the quantized data classification, classification accuracy, compared before and after improvement of classification accuracy (see Chapter 4).
Keywords/Search Tags:Data mining, Decision tree algorithm, Quantization algorithm, Classification accuracy
PDF Full Text Request
Related items