With the rapid development of Data Base technology and the abroad application of Data Base Management System, the data increases very quickly. So data is excessive but knowledge is spare. Under this condition, Data Mining as the tool of dealing with the abundant data comes into being. At present, the methods and technologies in data mining are as follows: Statistical Analysis Method, Decision Tree, Artificial Neural Network, Genetic Algorithm, Fuzzy Sets Method, Rough Sets Theory, Visual Technology etc. Among so many methods, Rough Sets Theory is a kind of more valid method to deal with the complicated system.Rough Sets Theory put forward by Pole Z.Pawlak in 1982 is a new data analysis theory of analyzing and dealing with uncertain and incomplete data. Rough Sets used in Data Mining has obvious superiority-it has no need to be provided any knowledge outside of the data which needs to be processed. It makes use of the equivalence relations to measure the indetermination degree of knowledge. Exactly because of this, Rough Sets Theory has stronger life in Data Mining. At present, Rough Sets Theory is applied in many aspects of Data Mining, for example: Attribute Reduction, Data Discretization, Association Rule Mining, Classification Rule Mining etc. In this paper, the attribute reduction and real value attributes discretization based on Rough Sets Theory are discussed.Attribute reduction is one of the most important problems in Rough Sets Theory. The knowledge is reduced by attribute reduction-deleting unnecessary attributes on the premise of keeping the information of decision table. That Rough Sets Theory can't deal with real value attributes constrains the application of Rough Sets Theory. But there are many real value attributes in real life. So the real value attributes need to be discretized during the data pretreatment.For the two applications of Rough Sets, the works done in the paper are as follows:(1) An attribute reduction algorithm based on reduction tree is proposed. This algorithm is simple, easy to be understood. It also reduces the time complexity of the algorithm to some extent.(2) From the view of logic algbera, the discernible boolean matrix is defined. The character of the matrix and the transformation to reduce it are presented. The attribute reduction model represented by discernible boolean matrix and linear logical equations is established. And the method to find the solutions of the model is discussed. The necessary and sufficient condition for linear logical equations having solutions and the other for linear logical equations having unique solution are obtained. The concept of classification coefficient is proposed. Finally, a simple, visual and high efficient attribute reduction algorithm based on classification coefficient and linear logical equations is given.(3) Discernible boolean matrix and logical equations are used in real value attribute discretization. The logical equations between cut sets and discernible boolean matrix is established. Finally, a new real value attributes discretization algorithm is given.(4) Finally, the attribute reduction algorithm and the new real value attributes discretization algorithm are used in Data Mining. A Data Mining Model based on Rough Sets is put forward.Many problems in Data Mining and Rough Sets Theory need to be discussed. And there are many problems of combining them. The according work will be done in the future. |