Font Size: a A A

An Improved Algorithm Of The ID3 Based On Impact Factors

Posted on:2015-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:X F HeFull Text:PDF
GTID:2348330518470446Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Across a wide variety of fields, data are being collected and accumulated at a dramatic pace. There is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data Consequently, data mining has become a research area with increasing importance. As one of the most important classifying data mining techniques, many classification methods based on decision trees have been influential in machine learning studies. In 1986,the ID3 algorithm proposed by Quinlan is one of the most influential decision tree algorithms. The ID3 algorithm chooses the attribute with the highest information gain as the splitting attribute, whose purpose is to get the minimum system entropy after division and get a decision tree with less depth, so as to improve the speed and accuracy of the algorithm.To overcome the problem that the gain criterion tends to favor attributes with more values, this paper proposes an improved algorithm of the ID3 algorithm based on impact factors. By introducing the concept of impact factors, the impact factors of the attributes and their values are taken into account. The improved algorithm chooses the improved information gain as the criterion to select division attributes. As the impact factors of attributes with more values is less on a higher probability, so the improved algorithm overcomes the problem of bias to the multi-valued attributes to some extent. Meanwhile in order to reduce the impacts of noise or outliers in the training dataset, the improved algorithm adopts a pre-pruning technique based on misclassification ratio to terminate some of the branches prematurely. Results show that the improved algorithm overcomes the problem of bias to the multi-valued attributes to some extent, and is superior to ID3 in accuracy and the structure of the generated decision tree despite more calculation.
Keywords/Search Tags:Data mining, Decision trees, ID3, Impact factors
PDF Full Text Request
Related items