Font Size: a A A

Research On Frequent And High Utility Itemset Mining Algorithms In Association Rule Mining

Posted on:2018-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2348330536472589Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Using the support as a metric to mine itemsets frequently occur in databases,association rule mining is first used to mine frequent itemsets.After each item in the databases can appear multiple times in one transaction as well as different items may have different weights are considered,frequent itemsets mining is extended to high utility itemsets mining,high utility itemset mining uses a measure users expect to dig out results that better fit the users' needs.This paper aims to study the improvement of time efficiency on frequent itemset mining and high utility itemset mining algorithms in association rule mining,its main contents include the following two aspects:1)Improved Apriori algorithm by combining transaction reduction with twoitem set support matrix to quickly prune the candidate itemsets.Firstly,when computing the support of the candidate itemsets,a self-defined data structure that stores frequent one-set itemsets is proposed to decide which transactions to scan,furthermore,after transaction reducing techqinue is introduced,and an improved algorithm,MR-Apriori algorithm,is propsed in the paper in order to further reduce useless items and transactions in databases;Besides,so as to optimize the pruning process of Apriori algorithm,a self-defined two-item set support matrix which is used to quickly prune the candidate itemsets,is added to the Apriori algorithm and a second improved algorithm,MP-Apriori,is proposed in the paper;Later an improved algorithm combines the improvement strategies of the MR-Apriori algorithm and MP-Apriori algorithm is proposed.Finally,an experiment is conducted on mushroom and T10I4D100 K dataset,the result shows that the running time of the improved MRApriori algorithm and the improved MP-Apriori algorithm is shorter than the original Apriori algorithm,while the running time of the MRP-Apriori algorithm which combines the optimization policies of the MR-Apriori algorithm and the MP-Apriori algorithm is the shortest,thus the time efficiency of these three improved algorithms is verified.2)Research on frequent and high utility itemset mining algorithm base on array preudo projection and transaction merging.After the defects of solely considerate either the support or the utility as a measure are analysed,a algorithm that based on the array pseudo projection data structure as well as recursively constructs its prefix itemsets' projected transaction databases to mine both frequent and high utility itemsets is proposed.This algorithm simultaneously considers support and utility as a metric to mine itemsets in databases that both with high occurrence count and high utility.So as to diminish the search space of the algorithm,two prune methods whichinclude local-utility based prune and subtree-based prune are put forward,based on the model of the algorithm and those prune methods,FUIM-P algorithm is proposed.It is observed that there are many transactions in the databases that can be merged,according to the characteristics of the algorithm,this merge is extended to the projection databases,then tran-saction(projected transaction)merging technique is introduced.At the same time,a self-defined sort rule is proposed to find transactions meet the conditions that can be merged with each other in linear time,then the final FUIM-MP algorithm is propoesd.Finally,an experiment is conducted on mushroom,chess and accident dataset,the result shows that the running time of FUIM-P algorithm is shorter than the compared FHIMA-ALL algorithm,while the final FUIM-MP algorithm which is proposed after transaction(projected transaction)merging is added,improves significantly compared with the previous two algorithms' time efficiency;Moreover,the huge number of transactions(projected transactions)that can be merged on mushroom,chess and accident dataset also prove the effectiveness of transaction(projected transaction)merging technique in improving the time efficiency of the algorithms.
Keywords/Search Tags:Frequent Itemsets, Transaction reducing, Support Matrix, High Utility Itemsets, Pseudo Projection, Transaction Merging, Time Efficiency
PDF Full Text Request
Related items