Progress in digital data acquisition, distribution, retrieval and storage technology has resulted in the growth of massive databases. One of the greatest challenges that organizations and individuals encountered is how to convert their collections of rapidly expanding data into accessible and actionable knowledge. The attempts to overcome these hurdles have gathered researchers from different areas, such as statistics, machine learning, and databases, which resulted in a new research field, so called Data Mining.Frequent itemset mining plays a crucial role in many data mining applications. It occurs in the discovery of association rules, strong rules, correlations, multidimensional patterns, and many other important discovery tasks. Frequent itemset mining dominates the time complexity of the discovery algorithms. Most existing work focuses on mining all frequent itemsets. However, as any subset of a frequent itemset is also frequent, the size of the set of all the maximal frequent itemsets is much smaller than that of the entire frequent itemsets. It has been observed that it suffices to mine only the set of maximal frequent itemsets instead of every frequent itemsets. There has been some recent research work that applied data mining techniques to an intrusion detection system for discovering new types of attacks. An in-depth analysis of data mining methods used in an intrusion detection system for discovering new types of attacks is presented and some open problems are addressed as well. This dissertation work focuses on maximal frequent itemset mining and its application. A number of new algorithms for mining maximal frequent itemsets with novel pruning strategies and parallelization solutions are presented for more efficiently , and a system model for intrusion detection based maximal frequent itemset mining is proposed for improving the accuracy and performance of an intrusion detection system. This thesis research makes contributions to both data mining and intrusion detection fields.Frequent itemset mining is a time-consuming task. Pruning strategies are widely used to improve the efficiency of searching. However, most of them still explore unnecessary redundant nodes. The classical pruning strategies are thoroughly studied and categorized into three main classes of root pruning, frequent extending and non-extending. Based on the in-depth analysis, a novel and powerful pruning strategy, called multi-level backtrack strategy, is presented in this dissertation. Compared with all the previous pruning strategies, which backtrack step by step, the multi-level backtrack strategy can backtrack best up to k levels when a k-length maximal frequent itemset has been found. The number of nodes need to be extended can be best reduced from to .Maximal frequent itemset mining is an important branch. However, it has not received much attention so far. A novel algorithm for mining maximal frequent itemsets, called MinMax (Mining Maximal), is presented in this dissertation. MinMax employs a vertical database layout scheme. Along with depth first search strategy, it uses a number of optimization techniques to prune the search space such as the multi-level backtrack pruning, root pruning, frequent extending and non-extending pruning, thus cutting down the search space efficiently. By introducing a new concept of no-support of a frequent item, the frequent items are ordered in the way that they lead to a maximal frequent itemset pruning as much as possible. The vertical database layout scheme makes it available to count the support of itemsets simply by set intersection operations instead of scanning database repeatedly. The experiment results show the effective performance, which are better than previous work especially in database with intensive long patterns.Parallel techniques are widely used to improve the efficiency of mining algorithms. A novel and powerful parallel algorithm, P-MinMax (Parallel MinMax), is proposed in this dissertation. Based on its serial version MinMax, the new algorithm decom... |