Font Size: a A A

A Study For Discretization Of Real Value Attributes Base On Rough Se Theory

Posted on:2011-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2178330332961701Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The discretization of continuous feature values is an effective technique to deal with continuous attributes for machine learning and data mining. Some algorithms in Rule extraction and feature classification can only handle categorical attribute. Discretization is a technique to partition continuous attributes into a finite set of adjacent intervals in order to generate attributes with a small number of distinct values. Reasonability of a discretization process is determined by the accuracy of expression and extraction for information. Nowadays, a series of Chi2 algorithms are famous discretization algorithms with the base of probability and statistics and the correlate algorithms based on Class-Attribute Interdependency are famous discretization algorithms with the basis of information theory. Discrete algorithm is the key to how to obtain the optimal partition, to maximize the significance of maintaining the information that reduce the loss of information.First, in this paper,by analyzing discretization algorithm of the Class-Attribute Interdependency Maximization proposed by Lukasz A Kurgan and Krzysztof J. Cios based information theory,a modified algorithm of CAIM is proposed. In CAIM algorithm, discretization criterion only accounts for the trend of maximizing the number of values belonging to a leading class within each interval .The disadvantage makes CAIM may generate irrational discrete results and further leads to the decrease of predictive accuracy of a classifier. The modified algorithm of CAIM take into consideration the importance of attribute order from small to large and a concept of attribute discernibility rate is proposed based on rough set. Both attribute discernibility rate and approximate quality are used for discretization intervals, which effectively resolves the problem of over- discretization .Second, algorithms of the correlation of Chi2 algorithm and based on Class-Attribute Interdependency are analyzed, and a novel algorithm for discretization of continuous attributes based on rough sets theory is proposed. The rough set requires that discretization should be maintained indiscernibility of the original decision-making system, however, many algorithms before permitted approximate quality descend controlled certain scope. In this paper, a novel method of splitting interval was proposed. The novel algorithm was more reasonable and effective to discretization of continual attribute, and assured not to change decision-making attributes.
Keywords/Search Tags:Discretization of real value attributes, Rough sets, The importance of attributes, Discretization interval, Data mining
PDF Full Text Request
Related items