Font Size: a A A

Researching Of Association Rules In Text Classification

Posted on:2009-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2178360278471003Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Association rules mining and text classification are both the key problems in the data mining field, which are widely applied to data mining task and were focused on by academic world in recent years. Followed by a further survey according to the applications of associated rules in text classification, the article improved the association text classification algorithm in the light of upgrading the efficiency of text classification.The following aspects are what the article studied on: text feature extract, feature select, general algorithms on text classification, Apriori algorithm on association rules mining, CBA algorithm on text association categorization and a more effective text classification algorithm was put forward; Aiming at the weak point of Apriori algorithm - low efficiency, the steps of Rule Generator was improved from different point of view, two improved algorithms were given. The main innovative points are as follows:1. Improving text association classification algorithm by using the characteristics of complete graph.According to the frequent itemset associated diagram that was generated by Matrix which was generated by the improved associated text algorithm combined with the characteristics of itemset, the corresponding relationship between frequent itemset associated diagram and complete subgraph was further disintered. The merit of the algorithm is that it no longer needs k-1 itemsets but directly produces the complete subgraph of associated diagram to get k itemsets.2. Binary Granular computing was used to improve the associated text classification algorithmA kind of improved algorithm of associated rules based on binary granular computing was proposed, the algorithm was based on the information granular, the linking step of scanning the database that was needed in Apriori algorithm was modified to the binary "AND" operation, which reduced the complexity of the algorithm; Whether the statistical number of "l"in the granular was larger than the minimal support degree was judged after the above operation, which eliminated the single pruning process and the efficiency of the algorithm was upgraded. Associated rule algorithm based on binary granular computing was applied to the CBA-RG process, which replaced the original Apriori algorithm in CBA algorithm, thus improved the efficiency of the text associated classification algorithm.The efficiency of those two associated text classification algorithms is higher than that of CBA algorithm; they have their separated merits that different efficiency aiming to different text databases.
Keywords/Search Tags:Association Rules, Text Classification, Apriori algorithm, CBA algorithm, Granule Computing, complete graph
PDF Full Text Request
Related items