Design And Implementation Of High Utility Itemset Mining Algorithm On Massive Data

Posted on:2022-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:R X Zhu

Full Text:PDF

GTID:2518306572969369

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

High utility itemset mining on massive data is a very important query.High utility itemset mining is an extension of frequent itemset mining when the items contains weights.This paper studies two kinds of research on high utility itemset mining: high utility itemset mining and top-k high utility itemset mining.The high utility itemset mining returns all itemsets whose utility is not less than the given threshold,while the top-k high utility itemset mining return the k itemsets that have the highest utility.The two kinds of studies are determined by different conditions and can provide corresponding decision support for users.Firstly,this paper studies the high utility itemset mining on massive data.In this paper,the baseline algorithm BA(Baseline Algorithm)is proposed.BA performs a scan to divide the data that cannot be put into memory to get some partitions,processes each partition accordingly,and returns the high utility itemsets.Because a lot partitions is generated in BA,this paper proposes algorithm HIM(High utility Itemset mining on Massive data)based on two rounds of scanning to reduce the number of partitions generated.HIM generates partition matrix and partition array,and generate less partitions by the partition array and the partition threshold.As each partition is processed,two strategies are used to build the appropriate RCAULs,using the enhanced singleton and closure theroem to end the recursive processing early.The experimental results show that HIM algorithm can efficiently perform high utility itemset mining on massive data.Secondly,to deal with top-k high utility itemset mining,this paper proposes the algorithm KUIM(top-K high Utility Itemset mining on Massive data).KUIM also contains two stages.In the pre-processing stage,in addition to generate the corresponding partitions,a number of pre-computations are also performed to realize the initialization of internal threshold and facilitate subsequent pruning.In the formal processing stage,KUIM uses pre-computed first two matrices MU and LIU to initial the threshold,processes each partition and transaction in the corresponding partition by 4 strategies to build i RCAULs structure,which take up less memory,updates possible itemsets in a min-heap and returns all itemsets in the heap.The experimental results show that the pre-processing algorithm of KUIM produces fewer partitions and deal with these partitions faster than the state-of-the-art algorithms.Finally,this paper takes HIM and KUIM algorithms as the core to realize the high utility itemset mining system on massive data.The system can well complete two kinds of mining of high utility itemsets.

Keywords/Search Tags:

massive data, high utility itemset mining, top-k high utility itemset mining

PDF Full Text Request

Related items

1	Research On Key Technologies Of High Utility Itemset Mining
2	Design And Implementation Of High Utility Itemset Mining Algorithm On Massive Data
3	Research On Frequent And Closed High Utility Itemset Mining Algorithm Based On Spark
4	Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining
5	Research On Accelerating High-utility Itemset Mining Based On Spark
6	Improvement And Application Of High Utility Itemset Mining Algorithm
7	Research On High-utility Itemset Data Minging Based On Distributed Platform
8	Research On Frequent And High-utility Itemset Mining Algorithms Over Data Stream
9	Research Of High Frequent-utility Itemset Mining
10	Research On Algorithm For Mining High Utility Itemset With Negative Item Values