Font Size: a A A

Design And Implementation Of High Utility Itemset Mining Algorithm On Massive Data

Posted on:2022-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:R X ZhuFull Text:PDF
GTID:2518306572969369Subject:Computer technology
Abstract/Summary:PDF Full Text Request
High utility itemset mining on massive data is a very important query.High utility itemset mining is an extension of frequent itemset mining when the items contains weights.This paper studies two kinds of research on high utility itemset mining: high utility itemset mining and top-k high utility itemset mining.The high utility itemset mining returns all itemsets whose utility is not less than the given threshold,while the top-k high utility itemset mining return the k itemsets that have the highest utility.The two kinds of studies are determined by different conditions and can provide corresponding decision support for users.Firstly,this paper studies the high utility itemset mining on massive data.In this paper,the baseline algorithm BA(Baseline Algorithm)is proposed.BA performs a scan to divide the data that cannot be put into memory to get some partitions,processes each partition accordingly,and returns the high utility itemsets.Because a lot partitions is generated in BA,this paper proposes algorithm HIM(High utility Itemset mining on Massive data)based on two rounds of scanning to reduce the number of partitions generated.HIM generates partition matrix and partition array,and generate less partitions by the partition array and the partition threshold.As each partition is processed,two strategies are used to build the appropriate RCAULs,using the enhanced singleton and closure theroem to end the recursive processing early.The experimental results show that HIM algorithm can efficiently perform high utility itemset mining on massive data.Secondly,to deal with top-k high utility itemset mining,this paper proposes the algorithm KUIM(top-K high Utility Itemset mining on Massive data).KUIM also contains two stages.In the pre-processing stage,in addition to generate the corresponding partitions,a number of pre-computations are also performed to realize the initialization of internal threshold and facilitate subsequent pruning.In the formal processing stage,KUIM uses pre-computed first two matrices MU and LIU to initial the threshold,processes each partition and transaction in the corresponding partition by 4 strategies to build i RCAULs structure,which take up less memory,updates possible itemsets in a min-heap and returns all itemsets in the heap.The experimental results show that the pre-processing algorithm of KUIM produces fewer partitions and deal with these partitions faster than the state-of-the-art algorithms.Finally,this paper takes HIM and KUIM algorithms as the core to realize the high utility itemset mining system on massive data.The system can well complete two kinds of mining of high utility itemsets.
Keywords/Search Tags:massive data, high utility itemset mining, top-k high utility itemset mining
PDF Full Text Request
Related items