Font Size: a A A

Research Of High Utility Itemsets Mining Methods

Posted on:2021-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LuFull Text:PDF
GTID:2428330647961933Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High utility itemsets mining(HUIM)has become the key of research filed in data mining.The goal of high utility itemsets mining is to solve the problem of frequent itemsets mining that only considers the frequency of itemsets.HUIM not only considers the purchase number of items in transitions,but also the unit profit.Almost of all previous HUIM algorithms only consider that the unit profit of items is positive,but sometimes the unit profit could be negative.According to the related pruning strategies of high utility itemsets mining algorithms,a subset of a low utility itemset must be a low utility itemset.But when there are negative utility items in transactions,maybe the subsets of low utility itemsets are high utility.Thus,the high utility itemsets with negative utility were ignored by HUIM algorithms.Furthermore,in the implementations of the existing HUIM algorithms,the minimum utility threshold needs to be set by users.In real-world,the setting threshold would directly affect the number of high utility itemsets generations further greatly affect the efficiency of the HUIM algorithms.Too high threshold makes no result,so that users cannot acquire high utility itemsets(HUIs).But too low threshold makes runtime of HUIM algorithms too long to finish.And HUIM algorithms take up a lot of memory even out of memory.Therefore,it is a difficult problem to set threshold.In view of these problems,this paper both researches setting threshold and dataset with negative utility.The main research as follows:(1)To solve the problem of dataset with negative utility,we proposed EHUIN algorithm(Efficient High Utility Itemsets Mining with Navigate utility).The EHUIN algorithm uses the method in the first scan database from Coverage theory,which coverage the itemset of same transition weighted utility in initial list.After that,with help of tenu formula(transitive extension with negative utility),the sum of utility of itemsets and its transitive extensions compare with minimum utility threshold.If the sum less than the threshold,the transitive extensions will be given up.Furthermore,the EHUIN algorithm adopts Advanced Filtration strategy due to construct utility list.Through the calculation of utility of tuples,the algorithm judges whether the utility lists is high utility lists or not.It can reduce runtime and memory space for improving efficiency.Using 24 experiments and about 600 thousand records,the efficiency of EHUIN algorithm is better than previous algorithms in all kind of dataset,especially dense.(2)To address the problem of setting minimum utility threshold,HUIM algorithms combine with the Top-k algorithm.The setting minimum utility threshold problem turns into the problem of setting number of high utility itemsets.At present,the main research directions of Top-k high utility itemsets mining is to improve data structure and optimize the construction of utility lists,but the memory management after utility list construction is ignored.With the increase of data volume,the utility lists would be more and more constructed for mining the high utility itemsets,but memory resources of utility list which are not used anymore cannot be reallocated so that the previous algorithms take up a large amount of memory spaces and computing resource in systems.To solve the problem,a novel Top-k high utility itemsets mining algorithm TKBPH(Top-k Buffer Pool High Utility Itemsets Mining)is proposed which uses Data Buffer Pool(DBP)structure to store information of itemsets avoiding frequent operation of comparison for searching itemsets.The TKBPH algorithm can dynamically insert and delete information of itemsets in Data Buffer Pool which efficiently reduce memory consumption in process of searching high utility itemsets.Experimental results show that TKBPH algorithm in several kinds of datasets has big advantages both runtime and memory consumption.
Keywords/Search Tags:High utility itemsets mining, Negative itemsets, Data buffer pool, Top-k
PDF Full Text Request
Related items