Font Size: a A A

Research On Mining Algorithm Of Maximal Frequent Itemsets

Posted on:2020-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2428330590971674Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the amount of data generated by various industries has exploded,but on the contrary,people cannot refer to sufficient and valuable laws when predicting the industry's prospects.Therefore,mining valuable information from massive data has become a research hotspot.Data mining is an important part of the current research field of artificial intelligence and database.The association rule mining is a branch of data mining.The association rule mining aims to explore the potential connections between different transactions and attributes.The research focus of this thesis is on mining the largest frequent itemsets in association rules mining.Based on the different compression structures of the original database and the largest frequent itemsets,this thesis mines the maximal frequent itemsets from optimization pruning,search strategy and superset detection.The problem has been thoroughly studied and analyzed.The main contents include:The classical maximal frequent itemset mining algorithm FPMAX is improved and an efficient maximal frequent itemset mining algorithm is proposed.The improved algorithm uses a new data structure TB-tree to compress the data,and uses the B-list data structure to represent the itemset,thus achieving efficiently intersection calculation between different itemsets and fast calculation of various support levels.The algorithm adopts the depth-first search strategy to search the full-order tree,and also introduces the parent-equivalent pruning technique to narrow the search space.Finally,it combines the MFI-tree-based projection pruning strategy to perform superset detection.The accuracy of the algorithm is guaranteed.The experimental results show that the improved algorithm has higher mining efficiency under the premise of ensuring the accuracy of the maximal frequent itemsets mined.The NB-MAFIA algorithm is a kind of efficient maximal frequent itemset mining algorithm,this thesis improved a maximal frequent itemset mining algorithm which performs well in both time efficiency and space efficiency.The improved algorithm uses the PPC-tree structure to compress the database,and then uses a new data structure DiffNodeset to calculate the intersection of different itemsets.The key of this algorithm is to introduce a new linear connection method to reduce the complexity of the 2-itemset DiffNodeset,and use the principle of difference set calculation to generate the k-itemsetDiffNodeset,which greatly improves the efficiency of the algorithm.Then use the collection enumeration tree as the search space,and use a variety of optimized pruning strategies to narrow the search space.Finally,the superset pruning strategy of MAFIA algorithm is combined to ensure the accuracy of the algorithm.The experimental results show that the algorithm has good effects in mining the maximal frequent itemsets in different types of data sets.
Keywords/Search Tags:data mining, association rules, maximal frequent itemsets, search, pruning
PDF Full Text Request
Related items