Font Size: a A A

Study Of Mining Algorithms For Credible Association Rules

Posted on:2010-06-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:B XiaoFull Text:PDF
GTID:1118360308961791Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The Association-Rule Mining (ARM) problem is an important study task in the data mining field. Most traditional ARM algorithms are based on the framework of support and confidence. The infrequent item sets are pruned by minimum support, and the more confident association rules are produced by minimum confidence. However, we found it is hard to select the appropriate support threshold for datasets with high skewed support distribution by the current mining algorithms. If the threshold is too high, many strong affinity patterns with low support's items are missed. But if the threshold is too low, many false rules are produced, which have no real meaning to the user.In this thesis, we studied how to produce the credible, confident and valid association rules. The innovations are described as follows:1. Propose the concept of Credible Association Rule (CAR)The items in a CAR are in the similar support. The presence of one item strongly implies the presence of other items in the same rule, that is, the items in a same rule are the co-occurrence. When mining such rules, the support threshold can be ignored, so the frequent and infrequent patterns can be produced together. To measure CARs, the credibility is proposed which represents the co-occurrence degree of the items in the same rule and such as the measures based on the distance measure and h-confidence are also described. The experimental results show there are many CARs in kinds of datasets and their credibility and confidence both exceed traditional association rules. So CARs can be applied to many fields.2. Present MaxCliqueMining algorithm for CAR based on maximal cliqueThe algorithm creates 2-item credible sets by adjacency matrix and then generates all rules based on maximal clique without scanning the datasets many times. MaxCliqueMining algorithm can mine the CARs based on not only credibility measure but also lift, cosine and correlation measure. Only the creation of 2-item credible sets is different among these measures. Our experimental results show the effectiveness and accuracy of this method, especially for the datasets with skewed support distribution.3. Present HHCP-growth algorithm of unified mining for hyperclique patterns and maximal hyperclique patternsHyperclique pattern (HP) and maximal hyperclique pattern (MHP) are two special types of CARs based on h-confidence. The standard algorithms mining the two kinds of patterns are different. In this paper, we present a fast algorithm called hybrid hyperclique pattern growth (HHCP-growth) based on FP-tree, which unifies the mining processes of the two patterns. The algorithm adopts recursive mining method without saving a mass of candidate generation. Besides the traditional minimum support pruning, the algorithm exploits some efficient pruning strategies such as maximum support pruning, item self pruning and remaining item pruning, which can reduce the number of recursion and traversal. It is proved to indicate the effectiveness of the strategies and the validity of the algorithm. The experimental results show that HHCP-growth is more effective than HP and MHP mining algorithms, especially for large-scale datasets or at low levels of support.4. Distribute the standard alarm datasets for alarm correlation analysis and studyIn out study, we collect some periods of alarms from a GRPS network management system of a province mobile corporation and a simulative network management system of an equipment vendor. These real alarm data are transformed to the standard data format for mining after preprocessing, denoising and filtering sensitive information. These datasets can be downloaded freely on the Web and used as the standard datasets for alarm correlation analysis and study.
Keywords/Search Tags:credible association rule (CAR), data mining, maximal clique, hyperclique pattern, alarm correlation analysis, credibility
PDF Full Text Request
Related items