Discovering Productive And Non-Redundant Itemsets In Dynamic Databases

Posted on:2019-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2428330590973930

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Frequent itemset mining has been widely studied and applied in numerous domains.It consists of finding sets of items(itemsets)that frequently occur in a set of database records(also called transactions).Although discovering frequent itemsets is useful,it can produce a large number of spurious patterns.As a result,the user may spend a great amount of time to analyze patterns found by a frequent itemset mining algorithm to find truly interesting patterns.Hence,in recent years,a key research topic has emerged which is to discover statistically significant patterns in databases.One of the most popular models for discovering statistically significant itemsets is to discover non-redundant productive itemsets.The state-of-the-art algorithm to extract this set of patterns is OPUS-Miner.An important limitation of this algorithm is that it is designed to be applied to a static database.Moreover,another limitation of OPUS-Miner is that it discovers all patterns in a database.OPUS-Miner relies on a user-specified metric(lift,leverage,or other metrics)to reduce the search space.In other words,the algorithm traverses an entire database using a pruning strategy to find patterns and the user cannot search for itemsets containing some specific items.This dissertation addresses these issues by defining the novel problem of discovering targeted non-redundant productive itemsets in dynamic databases.The proposed approach called IDPI+(Interactive Discovery of Productive Itemsets)stores transactions in a tree structure,which can then be interactively queried to identify productive and non-redundant itemsets containing specific items.A novel structure named Query-Tree is introduced to efficiently process multiple queries at the same time.Moreover,to handle dynamic databases,efficient transaction insertion and deletion algorithms are provided to update a MEIT(memory efficient itemsets tree),an optimized version of the Itemset-Tree data structure used to store dynamic databases.A novel algorithm is proposed to query the support of itemsets using an MEIT and a Query-Tree.Results of experiments on benchmark datasets containing various types of data show that the proposed approach can process thousands of queries per second on a desktop computer,and that it is up to 27 times faster than a baseline approach.

Keywords/Search Tags:

significant patterns, itemsets tree, productive itemset, query-tree

PDF Full Text Request

Related items

1	Research On The Algorithm Of Mining Frequent Itemsets From Uncertain Data Based On The Tree
2	FP-Tree Based Mining Frequent Itemsets Over Data Streams
3	Research On Mining Frequent Pattern Based On The Optimized FP-Tree In Data Streams
4	Research On Mining And Querying Frequent Patterns Based On Simplified Frequent Pattern Tree
5	Research On Fast Algorithms For Frequent Itemsets Mining Based On Compressed FP-tree
6	Mining Shared Knowledge\Patterns Between Two Datasets
7	An Research On Mining Frequent Itemsets Algorithm Based On Titled-time Window
8	Research On Data Items Frequent Itemsets Mining Algorithm Based On Sliding Window
9	The Research And Improvement Of XML Tree Pattern Matching Query Algorithm
10	Mining Algorithm Of Frequent Items Based On Item Adjacentcy List And Trasaction Tree