Font Size: a A A

Research Based On Rough Set In Data Mining Methods

Posted on:2010-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:X XiongFull Text:PDF
GTID:2178360272979354Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of database technology and the abroad application of Database Management System, the data increases very quickly. So data is excessive but knowledge is spare. Under this condition, Data Mining as the tool of dealing with the abundant data comes into being. At present, the methods and technologies in Data Mining are as follows: Statistical Analysis Method, Decision Tree, Artificial Neural Network, Genetic Algorithm, Fuzzy Set Method, Rough Set Theory, Visual Technology etc. Among so many methods, Rough Set Theory is a kind of more valid method to deal with the complicated systems. It is another mathematical tool to deal with uncertainty after Probability Theory, Fuzzy Set and Evidence Theory emerged. Rough Set Theory can effectively analyze and deal with kinds of incomplete information, and find implicit information from it.First of all, this thesis illuminates the theory about Data Mining and Rough Set. On this basis, the thesis conducts in-depth analysis of Data Mining process based on Rough Set, then studies and analyses the algorithms used in these processes. Classical rough set algorithms can not effectively adjust to an environment with huge amounts of data, because the algorithms demand data resides in the memory and memory capacity is limited. So Rough Set is faced with the challenge of massive data sets. This thesis introduces a structure of classification called Class Distribution List. The CDL can be got through carrying out direct classification on original data sets. And it can be considered as an index block which is set up in the massive data sets. Using CDL can easily deal with massive data sets. Through analysing the characteristics and structure of CDL, find the easy way to calculate the attribute's conditional information entropy. The thesis improves the algorithms used in the processes of data discretization, attribute reduction and attribute value reduction.The results of correctness and scalability experiments show that, without loss of the correct rate and recognition rate on original classical rough set algorithms, the improved algorithms can deal with massive data sets. Generating CDL through multi-step solves the problem of memory limitations and exponentially increases the amount of data which can be deal with.
Keywords/Search Tags:Data Mining, Rough Set, discretization, attribute reduction, value reduction
PDF Full Text Request
Related items