Research Based On Rough Set In Data Mining Methods

Posted on:2010-08-07

Degree:Master

Type:Thesis

Country:China

Candidate:X Xiong

Full Text:PDF

GTID:2178360272979354

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of database technology and the abroad application of Database Management System, the data increases very quickly. So data is excessive but knowledge is spare. Under this condition, Data Mining as the tool of dealing with the abundant data comes into being. At present, the methods and technologies in Data Mining are as follows: Statistical Analysis Method, Decision Tree, Artificial Neural Network, Genetic Algorithm, Fuzzy Set Method, Rough Set Theory, Visual Technology etc. Among so many methods, Rough Set Theory is a kind of more valid method to deal with the complicated systems. It is another mathematical tool to deal with uncertainty after Probability Theory, Fuzzy Set and Evidence Theory emerged. Rough Set Theory can effectively analyze and deal with kinds of incomplete information, and find implicit information from it.First of all, this thesis illuminates the theory about Data Mining and Rough Set. On this basis, the thesis conducts in-depth analysis of Data Mining process based on Rough Set, then studies and analyses the algorithms used in these processes. Classical rough set algorithms can not effectively adjust to an environment with huge amounts of data, because the algorithms demand data resides in the memory and memory capacity is limited. So Rough Set is faced with the challenge of massive data sets. This thesis introduces a structure of classification called Class Distribution List. The CDL can be got through carrying out direct classification on original data sets. And it can be considered as an index block which is set up in the massive data sets. Using CDL can easily deal with massive data sets. Through analysing the characteristics and structure of CDL, find the easy way to calculate the attribute's conditional information entropy. The thesis improves the algorithms used in the processes of data discretization, attribute reduction and attribute value reduction.The results of correctness and scalability experiments show that, without loss of the correct rate and recognition rate on original classical rough set algorithms, the improved algorithms can deal with massive data sets. Generating CDL through multi-step solves the problem of memory limitations and exponentially increases the amount of data which can be deal with.

Keywords/Search Tags:

Data Mining, Rough Set, discretization, attribute reduction, value reduction

PDF Full Text Request

Related items

1	A Study Of The Application Of Rough Set In Data Mining
2	Rough Sets Based Research On Method Of Discretization And Reduction Algorithm
3	The Research Of KA's Methods Based On Rough Set And It's Application
4	Data Mining Research Of Vehicle Sales Based On Hash Quick Attribute Reduction Algorithm
5	Research And Application Of Classification Algorithm Based On Rough Set
6	Web Log Mining Based On Rough Set
7	Research And Application Of Rough Set Attribute Discretization And Attribute Reduction
8	Data Mining Method Research Based On Rough Set Theory
9	The Data Mining Algorithm Based On Rough Sets
10	Research On Rough Set Theory Based Data Mining Algorithm