Study On Some Basic Problems Of Data Mining For Classification

Posted on:2004-10-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:R P Li

Full Text:PDF

GTID:1118360122982149

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Facing the massive volume and high dimensional data how to build effective and scalable algorithm for data mining is one of research directions of data mining. Aiming at above issues, some basic problems of data mining for classification have been studied substantially as follows: A structure-adaptive approach for neural-network-based feature selection is proposed in this paper. By pruning the redundant input features and hidden units alternatively, network architecture is kept reasonable. Experiments show that this method can effectively select features while improve the generalization ability of network.A hybrid method for mining classification rules is proposed. Firstly attribute reduction is done twice respectively by rough set theory and by neural network, and then rules are extracted from reduced decision tables by rough set theory. Experimental results show that this algorithm can produce more effective and simpler rules quickly and possesses good robustness.Local discretization methods are simple but have unsatisfactory effect, while global discretization methods can get better results but have costly computation. We present an appropriate compromise between two kind methods of discretization. Through adding an inconsistency checking to an existing entropy-based local approach, our algorithm possesses a global property. Experiments indicate that with the same rule generator C4.5, our method can produce stronger rules than existing methods.Several widely used uncertainty measures based on rough set theory and information entropy are compared and analyzed. We prove that these measures exist inconsistency in evaluating uncertainty of rules and give a necessary condition of occurring the inconsistency. The further direction of building more efficient uncertainty measure is also proposed.An algorithm based on rough set theory for extracting rules from data class by class is proposed. Firstly a reduct is derived for each class of data, and then for each class a discernibility matrix and a merger matrix are constructed and rules for this class are extracted based on the two matrices. Experiments on UCI data sets show that compared with traditional methods our algorithm can get more accurate rules in a shorter time.

Keywords/Search Tags:

Data mining, classification, rough set theory, neural network, entropy

PDF Full Text Request

Related items

1	The Research On The Classification Model Based On Rough Set And Entropy
2	The Research On Data Mining Algorithms Based-on Rough Set Theory
3	The Research On Data Preprocessing Based On Rough Sets Theory
4	Mining Classification Rules Base On Rough Set Theory
5	Space Data Mining Research Based On Rough Set Theory
6	Research On Applications Of Data Mining Based On Rough Set Theory
7	Rough Set-based Data Mining Techniques Applied Research In Network Security
8	The Research Of Mining Rules Based On Rough Set Theory
9	The Research Of Data Stream Classification Based On Rough Set Theory-neural Network Integration
10	On Data Mining Methods Based On Rough Set Theory And Its Application In Network Intrusion Detection