Font Size: a A A

Research On Discretization Of Continuous Attributes

Posted on:2008-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:J F JiFull Text:PDF
GTID:2178360212995647Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Because of the rapid dissemination and broad prevail of computers, it produce the vast amount of data and information. It uses a lot of study means and algorithms to obtain the hidden and useful knowledge in database. A lot of study algorithms require the input attribute value is discretized, and it has drawn forth a lot of discretized methods to discretize continuous attributes, this can be the interval determined by experts, or can according to a certain principle to input space divided by domain experts, providing the discretization by the cut points. The methods can generally be divided into supervised and unsupervised method; and it can be divided into global methods and local methods, according to discretize all the continuous attributes at the same time or discretize a single attribute at one time; it also can be divided into static methods and dynamic methods according to divide the interval before the classification or during the classification.The common discretization strategies are: equally space division strategy, adaptive method, equal interval of frequency, based-on the class information entropy methods. It is hard to find a direct and easily understand discretization result via various discretization means.In this paper, the common discretization algorithms and methods are introduced first. We propose a new method to obtaining discretization of the continuous attributes, based on the obtaining of the linguistic summaries and the linguistic rules from the database. This process has the advantages below:(1) It is easily to read the discretization result. It is hard to find the knowledge hidden in the database, if read the database directly. And the discretization result can be easily understand based the methods proposed in this paper;(2) All the results we got have the certain support degree, and we can select the language and rules which support degree are higher than the given threshold, as well as to fit the various requirements;(3) The obtaining progress has higher AI capacity, it requires the given threshold of every language term, and it can give us summaries and rules in nature language.In the progress of the obtaining, it makes the membership function by the experts based on the distribution of the attributes values, or it can made the proper membership function of language terms, and it can get the optimizing discretization via GA. Since we have the membership function of every language term, we compute the membership degree of every object. We collect the object which its membership degree is higher than the given threshold. In the same way, we can get another object set of other language terms. We compute the intersection set of the object sets, and the set is satisfied special conditions. The summaries and the rules are the sentences that describe the object set. And obtaining rules from database is the similarly the progress we discuss previous.In the progress of discretization, the Iris database is the example that we always use. And we use Iris database to obtain linguistic summaries and rules, at last, we use the rules to judge some objects, and we get better result to support the rules.
Keywords/Search Tags:Information system, Continuous attributes, Attribute discretization, Language summary, Language rule
PDF Full Text Request
Related items