Font Size: a A A

An Efficient Algorithm For Discovering High Utility Itemsets With Negative Item Values In Large Databases

Posted on:2011-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:Kouassi Kouadio Serge OlivierFull Text:PDF
GTID:2178360308469176Subject:Computer Science
Abstract/Summary:PDF Full Text Request
The aim to achieve of utility mining is to recognize the itemsets with highest utilities. Usually, utility itemsets consist of items with different values such as utilities. The values of utility itemsets were considered as positive In some previous applications, however, an itemset may be associated with negative item values. Discovery of high utility itemsets with negative item values is an important process for mining interesting patterns like association rules from large data bases. Realization of high utility itemsets with negative item values is advantageous, substantial. Our contribution can effectively identify high utility itemsets by generating fewer high transaction-weighted utilization itemsets such that the execution time can be reduced substantially in mining the high utility itemsets. In this way, the process of discovering all high utility itemsets with consideration of negative item values can be accomplished effectively with less requirements on memory space.Largest number of processes in finding frequent itemsets are developed, planned, created for the usual, Orthodox databases. Nevertheless, the frequency of an itemset may not be a sufficient indicator of significance, because frequency reflects only the number of transactions in the database that contain that itemset. It does not reveal the utility of an itemset, which can be measured in terms of cost, profit, or other expressions of user preference. Otherwise frequent itemsets may only contribute a small portion of the overall profit, whereas non-frequent itemsets may contribute a large portion of the profit. In reality, a retail business may be most interested in identifying its most valuable customers (customers who contribute a major fraction of the profits to the company).So, frequency is not sufficient to answer questions such as whether an itemset is highly profitable, or whether an itemset has a strong impact.High utility itemsets with negative item values plays an essential role in the theory and practice of many important data mining tasks, such as mining association rules and long patterns, emerging patterns and dependency rules. The goal of high utility itemsets with negative item values is to identify high utility itemsets, which drive a large portion of the total utility.To understand the concept of mining for negative item values in utility mining, we will illustrate one Scenario. For example, many super markets may promote certain items to attract customers. In this scenario customers may buy specific items and then receive free goods. Free goods result in negative value for super markets. However, supermarkets may earn higher profits from other items that are cross-promoted with these free items. This practice is common. For example, if a customer bought four of item A, he would then receive one free item B as a promotion from the supermarket. Suppose the supermarket gets 4 dollars of profit from each unit of item A sold, and loses 3 dollars for each unit of item B given away. Although giving away a unit of item B results in a loss of 3 dollars for supermarkets, they could possibly earn 16 dollars from the four units of item A that are cross-promoted with item B. The supermarket thus may have a net gain of 13 dollars from this promotion. We can define the utility of an itemset X, u(X), states that it is equal to the sum of the utilities of X of all the transactions containing X. Traditional association rules of mining models assume that the utility of each item is always 1 and that the quantity of sales is either 0 or 1; thus it is only a special case of utility mining in which the utility or the quantity of sales of each item can be any number. If u(X) is greater than a specified utility threshold, X is a high utility itemset; otherwise, it is a low utility itemset.This thesis aims to present an efficient algorithm for discovering hight utility itemsets with negative item values in large. Our algorithm HUIWNIV-Mine focuses on improving the response time by reducing candidate itemsets and CPU I/O in using transaction itemsets without negative value. In essence, by removing items with negative values from a transaction in a large database, algorithm HUIWNIV-Mine employs a filtering threshold within the database to deal with the transaction-weighted utilization itemsets (TWUI) generated. Algorithm HUIWNIV-Mine can overestimate some low utility itemsets, but it never underestimates any itemsets and it never loses any itemsets that may be of high utility. Each item of the itemset that has negative value will never be part of a high utility itemset. At least one item within an itemset should have positive value, or the itemset need not scan the database. Hence, the algorithm HUIWNIV-Mine outputs real high transaction-weighted utilization candidate itemsets after filtering some itemsets.The novel contribution of HUIWNIV-Mine is that it can effectively identify high utility itemsets with negative item values in less high TWUI such that the execution time can be reduced efficiently for mining all high utility itemsets with negative item values in large databases. HUIWNIV-Mine is promising for mining high utility itemsets in large databases with negative item values.
Keywords/Search Tags:Data Mining, association rules mining, transaction-weighted utilization, high utility, utility threshold
PDF Full Text Request
Related items