Rough set theory, introduced by Z. Pawlak in the early 1980s of the 20th century, is a mathematical theory of reasoning about data which can be also used to deal with vague and uncertain problems. Since 1990s of the last century, it has attracted much attention of researchers around the world and become a focus in the fields of computer science and information science. With about twenty years development, rough set theory has been successfully applied to many areas including machine learning, pattern recognition, decision analysis, process control, knowledge discovery from databases, and expert systems.The key processes of the applying of rough set to knowledge discovery were studied, including the discretization problem, attribute reduction and the acquisition of decision rules were studied, and the corresponding algorithms were proposed in the thesis.In Chapter1, the basic tasks, basic steps and basic methods of Knowledge Discovery are introduced. Consequently, the background of the emerging of rough set theory and developing of rough set theory are introduced. We also pointed out the virtues of applying rough set theory to knowledge discovery.In Chapter 2, the basic concepts in rough set theory were introduced. Rough set theory was also compared with some other theory dealing with vagueness.In Chapter 3, discretization problem was studied from two aspects: the heuristic method and the method based on genetic algorithm. A more efficient strategy of determination of candidate cuts was proposed contrasting to the strategy in the named method based on rough set and Boolean reasoning presented by Nguyen. Based on the strategy of determination of candidate cuts, an improved algorithm was constructed. The experiments show that the improved algorithm has lower space complexity and time complexity obviously. Additionally, a method based on immune algorithm was proposed which is easier to get small and consistent result.In Chapter 4, the reduction problem was studied. In order to obtain good relative reduct in decision system, a new attribute importance measure defined from the viewpoint of information theory which considers value distribution of selected attributes besides the mutual information between selected conditional attributes and decision attribute was defined. An algorithm based on the measure was constructed and analyzed. The experiments show that the algorithm can reduce the decision system effectively.In Chapter 5, the acquisition of decision rules was studied. We focused on inconsistent decision system. A modified algorithm was proposed. In the algorithm, the lower and upper of all decision classes were computed. Thus, the task of rule induction from inconsistent data is reduced to rule induction from consistent data. A set of certain rules can be induced from the lower approximation and a set of possible rules can be induced from the upper approximation. The induction process also made use of attribute importance method. Compared with the named LEM2 algorithm, the algorithm can acquire more than one rule at one time. The experiments on Hayes data show that the algorithm is effective.
