Font Size: a A A

The Research And Optimization Of ID3 Algorithm

Posted on:2018-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhangFull Text:PDF
GTID:2348330518497610Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of information technology and the generation of massive data, the traditional methods of manual processing and analysis of data can not meet the needs of reality. Especially in the era of big data, the number of data is huge, the structure is complex, the traditional processing methods can not be effective in these data. In today's era of data, having data and dealing with data reasonably and efficiently is critical for an enterprise, government, and even the whole country, so the research of massive data processing has attracted more and more attention. After decades of research, data mining theory has been developed as the most important theory of data processing and analysis, and has been widely used.The decision tree classification algorithm in data mining is the most basic and the most widely used analysis method, and this paper is based on the classical ID3 algorithm to explore the research. The two aspects of the training set and the test set selection method and the attribute splitting criteria in ID3 algorithm have been improved and verified by examples, the main contents of the paper are as follows:In Section 1,introduces the methods and steps of data preprocessing and pre analysis, the definition and method of classification algorithm, and further expounds the theory and implementation steps of ID3 algorithm.In Section 2, introduces commonly used methods of selected training set and test set and the advantages and disadvantages of them,and puts forward the weight classification method: first of all, the samples are classified according to the major categories and determine the weight according to the size of the proportion and the property, and then according to the weight to determine the size of the selected number of samples in this category that will select all the categories in the sample into the training set, the rest of the combined test set.In Section 3, the weights of splitting attributes are introduced on the basis of the classical ID3 algorithm. First,the training set is grouped according to the category, and the central values are calculated. Then the standard deviation of the central values is obtained. The attribute weights are determined according to the standard deviation of each attribute.In Section 4, using ID3 algorithm and improved ID3 algorithm for modeling,and the results show that: improved ID3 algorithm classification accuracy has been significantly improved.At last,this paper points out the shortcomings of the research and the areas for further improvement.
Keywords/Search Tags:data preprocessing, logistic regression, ID3 algorithm, weight classification method, splitting attribute weight
PDF Full Text Request
Related items