Font Size: a A A

Anti-Noise Fuzzy SLIQ Decision Trees Based On Sensitivity

Posted on:2012-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:H T ZhangFull Text:PDF
GTID:2218330338462752Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Decision tree is one of most widely used technology in data mining domain, and very popular with its prominent ability in knowledge acquisition and knowledge representation. With the production of huge amounts of data, uncertainty of knowledge imbedded in mass data is increasing, so people give to these uncertain knowledge more and more attention. In the mid-1960s, Zade constructively proposed the fuzzy set theory; henceforth people had a kind of more precise expression to the fuzzy knowledge. At the same time, the numerous scholars have introduced the fuzzy set theory into the decision tree domain, in order to overcome the incisive boundary problem that traditional decision trees have. ID3 algorithm is fuzzifyed earlier, and recently, SLIQ algorithm has been introduced into the fuzzy environment.The paper focuses on the fuzzy SLIQ algorithm, G-FDT proposed by Chandra, et al. For the phenomenon that the fuzzy decision tree which is induced by this algorithm degenerates into traditional crisp decision tree, the paper gives concrete analysis about the reason. And according to displayed under the fuzzy environment convexity malpractice of traditional test appraisal function of node split, this article proposes an innovative fuzzy SLIQ algorithm, anti-noise induction algorithm of fuzzy decision trees based on classification sensitivity of candidate attribute. Compared to G-FDT, this algorithm has following improvements:(1) For the drawback that the discrimination functions of candidate attributes which are contructed by G-FDT algorithm are too narrow, the propsed method of determining discrimination function fundamentally avoids this phenomenon mentioned above.(2) The concept of candidate attribute sensitivity is put forward. According to the convexity malpractice the traditional heuristic test function of node split under the fuzzy environment, this article proposes the concept of classification sensitivity that verifies classification ability of candidate attribute, one candidate attribute corresponds a relatively steep discrimination function, if the attribute has high classification sensitivity; thus, this makes the attribute to be inclining to be selected.(3) The mechanism of outlier detection is put forward. Because G-FDT or the early proposed SG-FDT algorithm has extremely low antijamming ability, the decision trees induced by them are weakened in term of the ability of knowlwdge representation. Therefore, the improved algorithm will delete outliers in current example set when it tests probable node split. Thus the decision tree relatively becomes stabler and robuster.(4) Optimization measures are proposed to makes the calculation more efficient. In order to improve the practicability of the induction algorithm, the paper proposes several optimizations to reduce the enormous cost of the complex operation, these measures include increasing the termination criterion of node split, testing candidate attribute in order to determine whether the attribute is used by the current node's ancestors.For the anti-noise fuzzy SLIQ decision trees algorithm based on sensitivity, the paper carries on corresponding simulation and the analysis of result. The experimental result indicated that this algorithm has implemented the fuzzy SLIQ algorithm truly, displayed the good toughness. And had the classification ability of fuzzy decision tree constructed with the algorithm has obtained large scale enhancement.
Keywords/Search Tags:decision tree, fuzzy set theory, sensitivity, SLIQ, boxplot
PDF Full Text Request
Related items