Font Size: a A A

Rules Extraction From ANN Based On Clustering

Posted on:2009-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2178360242480852Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the 1980s, techniques of artificial neural networks (ANN) have been applied with success to problem domains such as classification and optimization. However, neural networks is a blackbox model whose learned knowledge is concealed in a large amount of connections. So users can not understand what the nets learned, what the nets can do, how the nets will predict, and why the nets get the conclusions. This has not only weakened the confidence of users in building intelligent systems using neural computing techniques, but also hindered the application of neural networks to data mining.Rule extraction from trained neural networks provides a way for explaining the functioning of a neural network. This is important for artificial networks to gain a wider degree of acceptance. On the one hand, using rules to explain the knowledge which embedded in the networks makes the users and designers know how the networks work. On the other hand, according to the rules, users can easily discover the relations which were omitted in the past. A considerable amount of research has been carried out to develop mechanisms, procedures and techniques for extracting rules from trained neural networks. We propose a novel algorithm for extracting rules from artificial neural networks based on clustering. It is called REBC. REBC algorithm is described to extract rules from a pruned network with a single hidden layer. This method uses N2P2F algorithm to prune the network firstly. The pruning process attempts to eliminate as many connections and hidden or input units as possible from the network, while at the same time tries to maintain the prespecified accuracy rate. Then the activation values at the hidden units are clustered into discrete values. According to the discrete values, we can generate the rules from hidden units to output units using X2R algorithm. The inner rules are predigested based on the weights between hidden units and outputs. A new technique is applied in REBC algorithm. When many inputs are connected to a hidden unit in the network, the weights between input and hidden units is clustered. The detail is dealing with every hidden unit in turn. For every hidden unit, if the number of weights between inputs and it is small, we describe the neuron directly. And if not, we respectively cluster the weights for each discrete activation value of this hidden unit. In this process, the clustered number of weights will be dynamically adjusted according to the subset of input patterns which achieves the current activation value. After this, rules between input and hidden layer can be generated like MOFN rules.Based on the high accurate rate of the networks, REBC algorithm predigests the number of the rules and the number of the conditions of the rules. So, the efficiency of the algorithm is increasing greatly. Because the clustered number is small, it is more effective when the network is more complex.The correctness and practicability of the REBC algorithm are validated by seven-segment-digit problem and Wisconsin Breast Cancer dataset. The seven-segment-digit problem is designed because its incomplete display. This problem is simple and dataset is small. So it validates mainly the correctness. The breast cancer dataset is obtainable from the University of California Irvine data repository for machine learning. The practicability of the algorithm is validated by it. And we compare its results to the rules extracted by other algorithms.We trained a network to solve the seven-segment-digit problem. This problem is a real-world problem. Here, our target is to estimate odd or even. We use 10 pieces of data as book, including 5 odds and 5 evens. And the test dataset is the same as the book. According to the rules, the accuracy rate is 100%. This shows that REBC algorithm is correct. Algorithm clusters the weights between input and hidden units, which makes search space rapidly decreasing from 2*27=256 to 25+24=48. The efficiency is increasing greatly. And the number of rules is small.The breast cancer dataset is used to validate the practicability. The nine measurements taken from fine needle aspirates from human breast tissues correspond to cytological characteristics of a benign or of a malignant sample. Here, the comparison is performed along five dimensions: 1. the number of the book; 2. the accuracy of every class; 3. predictive accuracy; 4. average number of conditions of a rule; and 5. number of rules. The result shows that based on the accuracy of classification, the accuracy of the rules which generated by REBC algorithm is above 95%. And the average number of conditions of a rule and the number of the rules are small. Obviously, the understandability of the networks is improvement greatly.
Keywords/Search Tags:Extraction
PDF Full Text Request
Related items