Font Size: a A A

Text Classification Discovery Based On Association Rules

Posted on:2011-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2178360308954092Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic text classification is an important part of data management, the goal of which is to divide the text into several known categories automatically. Comparing with other text classification methods, the text classification based on association rules can not only generate classification rules which's easy to understand, but also is efficient and effective. So it becomes one of the main methods for automatic text classification.This paper analyzes two problems of classification based on association rules. One problem is that, when predicting the categories directly using association rules, the support degrees of some training texts may be computed more than one time, so that the classification role of some training texts are overemphasized. Another problem is that, the classification role of itmes will be overemphasized when determining the rules' weights only according to the weights of items, and the classification role of association will be ignored.To solve the above problems, this paper proposes an improved text classification algorithm based on association rules named WCCPF. There are three improvements: first, the more reasonable rules' weight. The new weighting method isn't only based on the training texts, but also takes the influence of unknown texts into account, and makes the rules' weights more reasonable by introducing similarity computation. Second, the improved classifier CPF-tree based on CR-tree. The new classifier can generate classification rules according to unknown texts fastly and dynamically, in order to avoid repeatly computing training texts' support degrees. Finally, the new pruning method. The new pruning method uses the maximum frequent set to prune the new classifier based on the mature pruning methods.The experimental result shows that the classification algorithm in this paper improves the precision of text classification.
Keywords/Search Tags:Association rules, Weighted rules, Text classification, CP-tree, CPF-tree
PDF Full Text Request
Related items