Font Size: a A A

Study On Some Basic Problems Of Data Mining For Classification

Posted on:2004-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:R P LiFull Text:PDF
GTID:1118360122982149Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Facing the massive volume and high dimensional data how to build effective and scalable algorithm for data mining is one of research directions of data mining. Aiming at above issues, some basic problems of data mining for classification have been studied substantially as follows: A structure-adaptive approach for neural-network-based feature selection is proposed in this paper. By pruning the redundant input features and hidden units alternatively, network architecture is kept reasonable. Experiments show that this method can effectively select features while improve the generalization ability of network.A hybrid method for mining classification rules is proposed. Firstly attribute reduction is done twice respectively by rough set theory and by neural network, and then rules are extracted from reduced decision tables by rough set theory. Experimental results show that this algorithm can produce more effective and simpler rules quickly and possesses good robustness.Local discretization methods are simple but have unsatisfactory effect, while global discretization methods can get better results but have costly computation. We present an appropriate compromise between two kind methods of discretization. Through adding an inconsistency checking to an existing entropy-based local approach, our algorithm possesses a global property. Experiments indicate that with the same rule generator C4.5, our method can produce stronger rules than existing methods.Several widely used uncertainty measures based on rough set theory and information entropy are compared and analyzed. We prove that these measures exist inconsistency in evaluating uncertainty of rules and give a necessary condition of occurring the inconsistency. The further direction of building more efficient uncertainty measure is also proposed.An algorithm based on rough set theory for extracting rules from data class by class is proposed. Firstly a reduct is derived for each class of data, and then for each class a discernibility matrix and a merger matrix are constructed and rules for this class are extracted based on the two matrices. Experiments on UCI data sets show that compared with traditional methods our algorithm can get more accurate rules in a shorter time.
Keywords/Search Tags:Data mining, classification, rough set theory, neural network, entropy
PDF Full Text Request
Related items