Font Size: a A A

Discovering Productive And Non-Redundant Itemsets In Dynamic Databases

Posted on:2019-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2428330590973930Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Frequent itemset mining has been widely studied and applied in numerous domains.It consists of finding sets of items(itemsets)that frequently occur in a set of database records(also called transactions).Although discovering frequent itemsets is useful,it can produce a large number of spurious patterns.As a result,the user may spend a great amount of time to analyze patterns found by a frequent itemset mining algorithm to find truly interesting patterns.Hence,in recent years,a key research topic has emerged which is to discover statistically significant patterns in databases.One of the most popular models for discovering statistically significant itemsets is to discover non-redundant productive itemsets.The state-of-the-art algorithm to extract this set of patterns is OPUS-Miner.An important limitation of this algorithm is that it is designed to be applied to a static database.Moreover,another limitation of OPUS-Miner is that it discovers all patterns in a database.OPUS-Miner relies on a user-specified metric(lift,leverage,or other metrics)to reduce the search space.In other words,the algorithm traverses an entire database using a pruning strategy to find patterns and the user cannot search for itemsets containing some specific items.This dissertation addresses these issues by defining the novel problem of discovering targeted non-redundant productive itemsets in dynamic databases.The proposed approach called IDPI+(Interactive Discovery of Productive Itemsets)stores transactions in a tree structure,which can then be interactively queried to identify productive and non-redundant itemsets containing specific items.A novel structure named Query-Tree is introduced to efficiently process multiple queries at the same time.Moreover,to handle dynamic databases,efficient transaction insertion and deletion algorithms are provided to update a MEIT(memory efficient itemsets tree),an optimized version of the Itemset-Tree data structure used to store dynamic databases.A novel algorithm is proposed to query the support of itemsets using an MEIT and a Query-Tree.Results of experiments on benchmark datasets containing various types of data show that the proposed approach can process thousands of queries per second on a desktop computer,and that it is up to 27 times faster than a baseline approach.
Keywords/Search Tags:significant patterns, itemsets tree, productive itemset, query-tree
PDF Full Text Request
Related items