Font Size: a A A

Earch On Frequent And High Utility Itemset Mining Algorithms

Posted on:2016-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2308330470457839Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Mining interesting itemsets from transaction database has attracted a lot of research work for more than a decade. Frequent itemsets mining mines those itemsets which appeared in large numbers in the database. There is a close link between these itemsets, which can provide useful information for decision-makers. High utility itemsets mining considers the difference between different transactions and the final targets the users concern. We can make these itemsets be more in line with customer demand, by selecting different utility functions.In fact, these two measures are considered individually, which lead to some shortages. Frequent itemsets may have very low contribution to user’s targets, or high profit itemsets may have very low support. It is meaningless to recommend these itemsets to users.To this end, in this paper we consider these two measures from a unified perspective. Specifically, we propose to identify the qualified itemsets which are both frequent and high utility. Additionally, among all the qualified itemsets, users usually take interests in the most important itemsets. Thus, we also formulate the problem of mining top-k frequent and high utility itemsets, that is, finding k qualified itemsets with maximum values on weighted sum of support and relative utility (i.e. quality).We propose an efficient algorithm named FHIMA (Frequent and High utility Itemset Mining Algorithms). FHIMA incorporates the idea of PrefixSpan to avoid generating infrequent candidates, thus leading to high efficiency. Moreover, an effective upper bound based on frequency and utility is designed to further prune the search space. Finally, the experiment results demonstrate the efficiency of FHIMA on synthetic datasets and the tight quality upper bound is more efficient than the loose bound.
Keywords/Search Tags:Frequent, High utility, Qualified itemsets, Quality upper bound
PDF Full Text Request
Related items