Font Size: a A A

Research On Mining Maximal Frequent Itemsets

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:2308330503979767Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of big data, data mining is more and more important for it can turn data into valuable information. Discovering association rules is the most important task of data mining. And mining frequent itemsets is the main step to find association rules. However, not only the large number of frequent itemsets, there exists the problem of information redundancy. Therefore, the maximum frequent itemsets come into being. The maximal frequent itemsets could compress the frequent itemsets and ease the pressure on data storage. In addition, maximal frequent itemsets play an important role in practical application, such as Web mining and DNA analysis. How to improve the efficiency of mining maximal frequent itemsets a significant research direction.This paper starts from the background and significance of mining maximal frequent itemsets. Through the domestic and foreign research, it is obvious found that algorithms of mining maximal frequent itemsets are on the basis of Apriori and FP-Tree. But Apriori-based algorithms generate many useless candidate itemsets during the process of mining frequent itemsets. In order to compute the support of itemsets, it is a waste of time to scan the database multiple times. And with the increase of width and depth, building FP-Tree will consume a great deal of time and space. Based these issues, we study the relevant problems in the process of mining maximal frequent itemsets. Review the search strategy and pruning strategy which will be used in the course of mining maximal frequent itemsets. Study four classic algorithms for mining maximal frequent itemsets to provide theoretical basis.GBMFI(Go-Back Maximal Frequent Itemsets)——an algorithm to mine maximal frequent itemsets is proposed. It uses the enumeration tree as search space to overcome bottom-up and top-down search strategy by depth-first search strategy. The database is presented in vertical format to reduce the computational complexity. And GBMFI uses infrequent subsets, frequent supersets and the support of parent and children to prune the tree. Compared with DepthProject, the effectiveness of the GBMFI is demonstrated by experiments on 4 data sets. And we make a research on the relationship between the minimum support and the number of maximal frequent itemsets. The results show that the algorithm works well. In addition, a parallel algorithm AMI for mining maximal frequent itemsets is proposed. Despite the poor results, AMI indicates a direction for further research.
Keywords/Search Tags:Data Mining, Maximal Frequent Itemsets, Backtrack Search, Pruning
PDF Full Text Request
Related items