Research Of High Average Utility Pattern Mining Algorithms

Posted on:2019-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:S F Ren

Full Text:PDF

GTID:2428330566498312

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The main purpose of data mining is to discover important,interesting and valuable information from various kinds of databases.Frequent itemset mining(FIM)and association rule mining(ARM)are two of the earliest topics of pattern mining.FIM discovers all itemset whose frequency is greater than the user preferred minimum support threshold.ARM analyze the correlation of discovered itemsets and find all association rules whose confidence is greater than the user preferred minimum confidence.FIM just consider the frequency of itemsets in database,not involving other information.In order to find more interesting patterns,such as high profit goods,high utility itemset mining(HUIM)was proposed by considering both quantities and unit profit of items.The task of HUIM can be viewed as an extension of FIM.Owing to not consider the length of itemset,HUIM tend to discover invalid long patterns in which all items have very low utility.By considering both of utility and length of itemsets,high average utility itemset mining(HAUIM)was proposed recently.Based on the concept of HAUIM,this dissertation focuses on three aspects stated as follows.Most of previous works focus on discovering high utilty itemsets from static databases.All of them cannot be efficiently applied to incremental databases.Thus,the dissertation proposes a tree-based algorithm IHAUPM for solving such problem.The algorithm maintains a modified HAUP-tree in main memory,which efficiently store the necessary information of database.When new transactions coming,it's necessary to scan these new transactions once to retrieve the necessary information.Based on our proposed fast updateing strategy,the tree in main memory is updated.Compared with current algorithms,it is almost unnecessary for IHAUPM to multiply rescan the original database,which saves amounts of computations and discovers the same itemsets.Traditional upper bound model is inefficient to shrink the search space.The dissertation design two new upper bound model and three pruning strategies based the two upper bound model.The proposed upper bound model prove to be smaller than traditional upper bound model in practice and in theory.Therefore,the proposed strategies are efficient to shrink the search space and speed up discovering the satisfied itemsets.Because almost previous works focus on mining high average-utility itemsets in single threshold framework,the characteristic of each item is ignored and the “rare pattern” problem appears.Due to each item in database has its own characteristic,too large threshold tend to leading to missing many interesting pattern while too small threshold may lead to discovering many uninteresting pattern.Therefore,the dissertation propose an efficient algorithm MEMU under multiple threshold framework,which can overcome the inefficiency of previous works.Besides,by arranging the itemsets in the proposed sorted enumeration tree,the downward closure of traditional upper bound model is guaranteed.To further improve the performance,the upper bound model proposed in second part are adapted to this problem and three pruning strategies are proposed based on the tighter upper bound model and sorted enumeration tree.By designing the compact average utility list structure,all necessary information is stored and the proposed algorithm overcome the drawbacks of re-scanning database repeatedly happened in previous works.In summary,the dissertation solves three basic problems in high average utility itemset mining.Extensive experiments were designed and conducted to show the effectiveness and efficiency of the proposed mothods.

Keywords/Search Tags:

High average utility pattern mining, multiple threshold, upper bound model, transaction data

PDF Full Text Request

Related items

1	Research On High Utility Pattern Mining Methods In Data Stream
2	Research On High Average-utility Itemsets Mining Algorithm
3	Algorithmic Research For Mining High Average Fuzzy Utility Itemset With Multiple Minimum Utility Thresholds
4	Research On Novel Methods In Utility Pattern Mining
5	An Efficient Algorithm For Discovering High Utility Itemsets With Negative Item Values In Large Databases
6	Research On High Utility Pattern Mining Technology
7	Research Of High Utility Sequential Pattern Mining
8	Research On Improvement Of High Utility Pattern Mining Algorithm Over Data Streams
9	Research On High Average-utility Itemsets Mining Algorithm
10	Earch On Frequent And High Utility Itemset Mining Algorithms