Font Size: a A A

Research Of High Average Utility Pattern Mining Algorithms

Posted on:2019-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:S F RenFull Text:PDF
GTID:2428330566498312Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The main purpose of data mining is to discover important,interesting and valuable information from various kinds of databases.Frequent itemset mining(FIM)and association rule mining(ARM)are two of the earliest topics of pattern mining.FIM discovers all itemset whose frequency is greater than the user preferred minimum support threshold.ARM analyze the correlation of discovered itemsets and find all association rules whose confidence is greater than the user preferred minimum confidence.FIM just consider the frequency of itemsets in database,not involving other information.In order to find more interesting patterns,such as high profit goods,high utility itemset mining(HUIM)was proposed by considering both quantities and unit profit of items.The task of HUIM can be viewed as an extension of FIM.Owing to not consider the length of itemset,HUIM tend to discover invalid long patterns in which all items have very low utility.By considering both of utility and length of itemsets,high average utility itemset mining(HAUIM)was proposed recently.Based on the concept of HAUIM,this dissertation focuses on three aspects stated as follows.Most of previous works focus on discovering high utilty itemsets from static databases.All of them cannot be efficiently applied to incremental databases.Thus,the dissertation proposes a tree-based algorithm IHAUPM for solving such problem.The algorithm maintains a modified HAUP-tree in main memory,which efficiently store the necessary information of database.When new transactions coming,it's necessary to scan these new transactions once to retrieve the necessary information.Based on our proposed fast updateing strategy,the tree in main memory is updated.Compared with current algorithms,it is almost unnecessary for IHAUPM to multiply rescan the original database,which saves amounts of computations and discovers the same itemsets.Traditional upper bound model is inefficient to shrink the search space.The dissertation design two new upper bound model and three pruning strategies based the two upper bound model.The proposed upper bound model prove to be smaller than traditional upper bound model in practice and in theory.Therefore,the proposed strategies are efficient to shrink the search space and speed up discovering the satisfied itemsets.Because almost previous works focus on mining high average-utility itemsets in single threshold framework,the characteristic of each item is ignored and the “rare pattern” problem appears.Due to each item in database has its own characteristic,too large threshold tend to leading to missing many interesting pattern while too small threshold may lead to discovering many uninteresting pattern.Therefore,the dissertation propose an efficient algorithm MEMU under multiple threshold framework,which can overcome the inefficiency of previous works.Besides,by arranging the itemsets in the proposed sorted enumeration tree,the downward closure of traditional upper bound model is guaranteed.To further improve the performance,the upper bound model proposed in second part are adapted to this problem and three pruning strategies are proposed based on the tighter upper bound model and sorted enumeration tree.By designing the compact average utility list structure,all necessary information is stored and the proposed algorithm overcome the drawbacks of re-scanning database repeatedly happened in previous works.In summary,the dissertation solves three basic problems in high average utility itemset mining.Extensive experiments were designed and conducted to show the effectiveness and efficiency of the proposed mothods.
Keywords/Search Tags:High average utility pattern mining, multiple threshold, upper bound model, transaction data
PDF Full Text Request
Related items