Knowledge discovery in databases (KDD) is a multidisciplinary field, drawing work from areas including database technology, artificial intelligence, statistics and so on. Decision tree is a method of KDD that is used widely to mining classification models. It has been studied widely and made a great progress. While the decision trees are always tends to be over-fitting, to have larger scales and to induce longer classification rules in that the tree induction algorithm adopts greedy method. Many methods are proposed to improve these flaws mentioned above. In this thesis these methods are studied completely and sequentially a new method to optimize decision trees is put forward.There are three main points in this thesis as follows:1.A general survey of KDD is given including the definition, basic process, main method and the status of development. The decision tree and several other methods used to mining classification rules are introduced as emphasis.2. A detailed survey of all the decision tree optimization approaches is given, such as modifying test space, modifying test search, restricting database and alternating data structures. The classical algorithms of each kind of approach are also summarized and critiqued.3. A new approach is proposed to reduce the testing attribute sets of decision trees on the base of knowledge reduction coming from the Rough Set theory. With this approach some test attributes unrelated to the classification are removed. Therefore relatively smaller training sets can be found to induce relatively smaller decision trees without reducing accuracy. In the last part, we evaluate the method on several data sets compared with ID3 algorithm.
|