Font Size: a A A

Application Of Information Theory In The Discretization Of Continuous Attributes Of The Rough Set

Posted on:2011-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:H L YueFull Text:PDF
GTID:2208330332956557Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Today, KDD is one of the most popular field of Computer Science and Artificial Intelligence, and Rough Set has become a very important theory in the field of KDD due to its unique advantages. Discretization is the necessary data pre-processing stage when we discover knowledge from database using Rough Set theory. In this paper, we proposed two kinds of discretization for Decision Table which based on Information Theory and Rough Set Theory. In order to keep the both algorithms reliability and efficiency, we use the relative concepts of Information Theory. So, the amount of information in Decision Table is not lost too much, and the experiment proved that both algorithms are effective.The main work of this paper is as follows:1) Discussing our research background, and pointing out discretization is a necessary pre-processing step for KDD using Rough Set. Then, we conducted a comprehensive study on current research status of discretization;2) Introducing the Decision Table which is important for data representation in KDD field, and the formal definition of Decision Table based on Rough Set is proposed. The process of discretization is formally described, and we have analyzed the evaluation of discretization;3) Describing the basic concepts of information theory. A relationship between knowledge and information is set up, and then based on the relationship an information representation of the concepts and operations about rough set theory are given. In addition, the equivalence properties between information representation and algebraic representation of knowledge reduction are proved. These conclusions are helpful for people to understand the essence of rough set and seek new algorithms of KDD using rough set.4) Two algorithms of discretization for Decision Table are raised in this paper. Both are based on the statistical concepts of Information Theory. For the purpose of ensuring that the Incompatibility of Decision Table doesn't change, the inconsistency of decision table is as stop conditions for discretization in both algorithms;5) Programming both algorithms in VC++6.0 environment. We classify the discrete data using C4.5 and SVM implemented at the DMBench platform. It is proved that our algorithms are effective by compared with other discretization algorithms.
Keywords/Search Tags:KDD, Rough Set, Decision Table, Information Theory, Discretization
PDF Full Text Request
Related items