The Research And Optimization Of ID3 Algorithm

Posted on:2018-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:X X Zhang

Full Text:PDF

GTID:2348330518497610

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

With the development of information technology and the generation of massive data, the traditional methods of manual processing and analysis of data can not meet the needs of reality. Especially in the era of big data, the number of data is huge, the structure is complex, the traditional processing methods can not be effective in these data. In today’s era of data, having data and dealing with data reasonably and efficiently is critical for an enterprise, government, and even the whole country, so the research of massive data processing has attracted more and more attention. After decades of research, data mining theory has been developed as the most important theory of data processing and analysis, and has been widely used.The decision tree classification algorithm in data mining is the most basic and the most widely used analysis method, and this paper is based on the classical ID3 algorithm to explore the research. The two aspects of the training set and the test set selection method and the attribute splitting criteria in ID3 algorithm have been improved and verified by examples, the main contents of the paper are as follows:In Section 1,introduces the methods and steps of data preprocessing and pre analysis, the definition and method of classification algorithm, and further expounds the theory and implementation steps of ID3 algorithm.In Section 2, introduces commonly used methods of selected training set and test set and the advantages and disadvantages of them,and puts forward the weight classification method: first of all, the samples are classified according to the major categories and determine the weight according to the size of the proportion and the property, and then according to the weight to determine the size of the selected number of samples in this category that will select all the categories in the sample into the training set, the rest of the combined test set.In Section 3, the weights of splitting attributes are introduced on the basis of the classical ID3 algorithm. First,the training set is grouped according to the category, and the central values are calculated. Then the standard deviation of the central values is obtained. The attribute weights are determined according to the standard deviation of each attribute.In Section 4, using ID3 algorithm and improved ID3 algorithm for modeling,and the results show that: improved ID3 algorithm classification accuracy has been significantly improved.At last,this paper points out the shortcomings of the research and the areas for further improvement.

Keywords/Search Tags:

data preprocessing, logistic regression, ID3 algorithm, weight classification method, splitting attribute weight

PDF Full Text Request

Related items

1	K Nearest Neighbors Algorithm Based Multi-label Classification
2	Application Of Clustering Algorithm Based On Attribute Weighting In Bank Customer Segmentation
3	Research On Method Of Attribute Weight Based On Rough Sets Theory
4	Text Augmentation Method Based On Label Relevance Weight Filtering Mechanism In Sentiment Classification
5	Research And Application Of Multi Classification Logistic Regression Algorithm In Big Data Environment
6	Research On UBI Rate Determination Method Based On Entropy Weight-Topsis And Clustering
7	On Exact Algorithms For The Maximum Clique Problem
8	K-modes Cluster Analysis An Application Based On Attribute Value Weight
9	Research Of Weight Algorithm In KNN Text Classification
10	Kernel Logistic Regression For Imbalanced Data Classification