Font Size: a A A

Research Of High Frequent-utility Itemset Mining

Posted on:2018-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2348330536481715Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the rapid growth of information technologies,frequent itemset mining(FIM)and high-utility itemset mining(HUIM)have been widely studied since they can be applied into various applications for decision-making.The FIM,which is based on occurrence frequency of items,is the fundamental topic in data mining.The HUIM was designed to consider both quantity and unit utility of the items in the database,which has been emerging as a critical issue in recent decades.Traditional algorithms of FIM and HUIM have to handle the “exponential problem” in a very huge search space while the number of distinct items or the size of database is very large.Both FIM and HUIM are required to set the minimum support threshold or minimum utility threshold to reveal the required information.Besides,FIM and HUIM cannot,however,meet the needs of practical applications since both of them can only consider one evaluation measure to find the required information.The main contents and contributions of this dissertation are described as follows.In the first part of this dissertation,an efficient PSO-based algorithm,namely HUIM-BPSO algorithm,is proposed to efficiently find HUIs.The designed HUIM-BPSO algorithm finds the high-transaction-weighted utilization 1-itemsets(1-HTWUIs)as the size of the particles based on transaction-weighted utility(TWU)model,which can greatly reduce the combinational problem in evolution process.An OR/NOR-tree structure is further developed to reduce the invalid combinations for discovering HUIs.Based on the designed algorithm,it can find the most HUIs without searching all the itemsets and has better performance than the state-of-the-art GA-based algorithm.In the second part of this dissertation,an effective algorithm called skyline frequent-utility(SFU)-Miner and its improved algorithms are presented to mine the skyline frequent-utility patterns(SFUPs)based on the utility-list structure.The skyline concept considers more than one factor to return a set of points for decision-making.The utility-list structure is applied into the designed algorithm to efficiently mine the actual utility of the itemsets without candidate generation.Besides,two arrays called uemax and ugmax are respectively developed to greatly reduce the enormous search space for finding the skyline points.This property can be used to efficiently find the non-dominated itemsets based on utility and frequency measures.Since the discovered pattern in SFUPs is non-dominated to each other,the number of skyline points is much concisely than the rules discovered from traditional mining approaches.Besides,the set of points can be successfully discovered without setting the minimum thresholds based on the designed skyline approaches.Overall,this dissertation combines the basic theoretical exploration which are frequent itemset mining,high utility itemset mining and skyline multi-objective itemset mining.Besides,a large number of experiments are combined to verify the effectiveness of the proposed algorithms.The research issues of HUIM and skyline frequent-utility patterns mining will be further explored as our future works.
Keywords/Search Tags:data mining, high-utility itemset, skyline concept, skyline frequent-utility pattern
PDF Full Text Request
Related items