Font Size: a A A

Research On High Average-utility Itemsets Mining Algorithm

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2428330611980630Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
High average-utility itemsets mining has attracted much attention in the field of data mining because of its balanced utility.It takes into account not only the profit and quantity of the itemsets,but also the length of the itemsets.In this paper,a high average-utility itemsets mining algorithm based on the upper bound of utility summation HAUIM-GMU is proposed.For this algorithm,we first extend the typical upper bound of maximum utility and average-utility from a single item to an itemset,and discuss its rationality;then,based on the concept of support,we propose a new pruning strategy;finally,the algorithm is described in detail.A large number of experiments on real datasets and synthetic datasets show that the algorithm has good performance.Although a variety of algorithms have been proposed for mining high average-utility itemsets in recent years,it is still difficult to determine an appropriate minimum average-utility threshold to effectively and accurately control the mining results.Inspired by Top-K frequent itemsets mining and Top-K high utility itemsets mining,this paper proposes a Top-K high average-utility itemsets mining algorithm based on the cross-entropy method,where K is the expected number of high average-utility itemsets to be mined,rather than the conventional minimum average-utility threshold.The algorithm does not need to design various effective strategies to improve the internal minimum average-utility threshold and reduce the search space.The method of combinatorial optimization is used to solve the problem of mining Top-K high average-utility itemsets.Experimental results show that the algorithm is efficient and memory-saving,and can find most practical Top-K high-average utility itemsets.In the previous research on mining high average-utility itemsets,the utility of itemsets is considered to be positive.However,in some practical applications,theutility of the itemsets may be negative.Therefore,the discovery of high average-utility itemsets with negative utility values is of great significance for pattern mining.This paper proposes a new mining method: high average-utility itemsets mining algorithm considering negative utility.The algorithm improves the HAUIM-GMU,which greatly reduces the execution time of mining high average-utility itemsets,thus effectively mine all high average-utility itemsets considering negative utility with less memory,and meets the key requirements of time and space efficiency of mining high average-utility itemsets with negative utility values.The experimental evaluation shows that the algorithm is efficient and feasible.
Keywords/Search Tags:High average-utility itemsets mining, Generalized maximum utility, Generalized average-utility upper bound, critical support number, Top-K mining, Cross-Entropy method, negative utility
PDF Full Text Request
Related items