Font Size: a A A

Research On High Utility Itemsets Mining Algorithm

Posted on:2011-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:F YanFull Text:PDF
GTID:2178360308469137Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Association rule mining (ARM) aims at analyzing the interesting correlations between items and extracting rules for users. ARM is one of the most important research fields of data mining and has tremendous applications in business, science and other domains. However, the traditional ARM which generates strong association rules based on "support-confidence" model, only considers the frequency of itemsets, so the user may not be interested in the results, and can not discover some rules that have a low support but a high utility. Utility-based association rule mining has defined to make up for this deficiency. It uses utility to measure the importance of itemsets, reflects the user'preferences, and better meet the needs of decision-making. To improve the efficiency of high utility itemsets mining is the purpose of our study. The main works of the paper include:Analyzed the advantage and shortcoming of the current high utility mining algorithms, and present an efficient algorithm, the fast utility itemsets mining (FUI-Mine). It is an improved way to partition the original database, which results from clustering transactions, not only significantly reduce the search time, but does not need to repeat the scan. Then it constructs a new data structure named FUI-Tree, which is for compressed storage and mining independent. All high utility itemsets are generated by checking the leaves of each FUI-tree, without traversing the tree recursively. Experimental results show that, when the number of the longest of the itemsets in the data sets is smaller, the efficiency of the algorithm is significantly prefect to Two-Phase and CTU-Mine, which are representative ones in current high utility itemsets mining algorithms.FUI-Mine algorithm can quickly get the longer utility patterns, but the shorter utility patterns mining is the bottleneck for whole mining algorithm. So a hybrid algorithm name Hybird-Mine is proposed to solve it. The Hybrid-Mine algorithm combine the FUI-Mine and the column enumeration method, which the former to mine the longer patterns and the other used to mine the shorter ones. In the column enumeration method, it makes the data set to vertical data format, and intersects the transaction to get the all short itemsets directly. Then the back-complement is defined in the paper to achieve the optimization of the column enumeration method, and minimize the number of intersection of the itemset and storage space. The theory of transaction utility closed down also applies to the method and can prune amount of the itemsets, which the transaction weighted utility is lower the minimum utility, and reduces the search space. Experiments show that the Hybird-Mine algorithm perfects the FUI-Mine and improves the mining efficiency of short patterns.Currently, the high utility algorithms proposed are designed to extract the complete patterns. It will achieve amount of utility itemsets when the threshold set too low or exists long pattern in data set. Therefore through analyzes the reality meaning and the mathematical feature of the support count and utility, a high utility itemset mining method combining with the closed pattern constraint is proposed in this paper. It cuts back the number of high utility itemsets without reduce the effect decision for users. Finally, a high utility itemsets mining algorithm based on closed constraint (CHU-Mine) with some experiments is present. It proves that CTU-Mine algorithm efficiently decreases the quantity of mining results, and simultaneously improves the algorithm efficiency.
Keywords/Search Tags:Data mining, Association rule, High utility itemset, Column enumeration, Closed pattern
PDF Full Text Request
Related items