Research Of High Frequent-utility Itemset Mining

Posted on:2018-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:L Yang

Full Text:PDF

GTID:2348330536481715

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As the rapid growth of information technologies,frequent itemset mining(FIM)and high-utility itemset mining(HUIM)have been widely studied since they can be applied into various applications for decision-making.The FIM,which is based on occurrence frequency of items,is the fundamental topic in data mining.The HUIM was designed to consider both quantity and unit utility of the items in the database,which has been emerging as a critical issue in recent decades.Traditional algorithms of FIM and HUIM have to handle the “exponential problem” in a very huge search space while the number of distinct items or the size of database is very large.Both FIM and HUIM are required to set the minimum support threshold or minimum utility threshold to reveal the required information.Besides,FIM and HUIM cannot,however,meet the needs of practical applications since both of them can only consider one evaluation measure to find the required information.The main contents and contributions of this dissertation are described as follows.In the first part of this dissertation,an efficient PSO-based algorithm,namely HUIM-BPSO algorithm,is proposed to efficiently find HUIs.The designed HUIM-BPSO algorithm finds the high-transaction-weighted utilization 1-itemsets(1-HTWUIs)as the size of the particles based on transaction-weighted utility(TWU)model,which can greatly reduce the combinational problem in evolution process.An OR/NOR-tree structure is further developed to reduce the invalid combinations for discovering HUIs.Based on the designed algorithm,it can find the most HUIs without searching all the itemsets and has better performance than the state-of-the-art GA-based algorithm.In the second part of this dissertation,an effective algorithm called skyline frequent-utility(SFU)-Miner and its improved algorithms are presented to mine the skyline frequent-utility patterns(SFUPs)based on the utility-list structure.The skyline concept considers more than one factor to return a set of points for decision-making.The utility-list structure is applied into the designed algorithm to efficiently mine the actual utility of the itemsets without candidate generation.Besides,two arrays called uemax and ugmax are respectively developed to greatly reduce the enormous search space for finding the skyline points.This property can be used to efficiently find the non-dominated itemsets based on utility and frequency measures.Since the discovered pattern in SFUPs is non-dominated to each other,the number of skyline points is much concisely than the rules discovered from traditional mining approaches.Besides,the set of points can be successfully discovered without setting the minimum thresholds based on the designed skyline approaches.Overall,this dissertation combines the basic theoretical exploration which are frequent itemset mining,high utility itemset mining and skyline multi-objective itemset mining.Besides,a large number of experiments are combined to verify the effectiveness of the proposed algorithms.The research issues of HUIM and skyline frequent-utility patterns mining will be further explored as our future works.

Keywords/Search Tags:

data mining, high-utility itemset, skyline concept, skyline frequent-utility pattern

PDF Full Text Request

Related items

1	Research On Skyline Pattern Mining Algorithm
2	Research On Novel Methods In Utility Pattern Mining
3	Research On Key Technologies Of High Utility Itemset Mining
4	Research On High Utility Pattern Mining Methods In Data Stream
5	Research On Frequent And Closed High Utility Itemset Mining Algorithm Based On Spark
6	Research On High Utility Pattern Mining Method For Big Data
7	Research On High Utility Pattern Mining Technology
8	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
9	Research On Frequent And High-utility Itemset Mining Algorithms Over Data Stream
10	Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining