Font Size: a A A

Research On High Average-utility Itemsets Mining Algorithm

Posted on:2019-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z HeFull Text:PDF
GTID:2428330545973987Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Association rule mining is the method that find the certain connection relation between items in transaction database.It utilizes some meaningful metrics to identify strong rules in transaction database,so as to provide some interesting information for decision-maker.Association rules mining has been widely applied in many fields,including commercial decision-making and recommendation,bioengineering and medicine,scientific research.However,the conventional framework of association rule mining is frequent itemset mining with support-confidence framework,which emphasize the frequency of itemset and ignore the difference between the items.Thus,it is likely to lost low support rules with high utility.The introduction of utility measure to association rule mining is high utility itemset mining(HAUIM),which overcomes such drawbacks.HAUIM measures the interestingness of itemset by utility,which fully considers the difference between items and frequency,and ensure more practical results.However,it can be found that the longer the length of the itemset,the greater its utility value.And long itemsets is usually have items contributed most of the utility.In order to evaluate the itemset more objectively,high average-utility itemset mining(HAUIM)have been proposed.Nevertheless,the existing HAUIM algorithms require user to have enough background knowledge and experience,to set up the necessary minimum utility threshold parameters.This paper focuses on the Top-k HAUIM algorithm,which solves the problem that the minimum utility threshold is difficult to set and is replaced by a more intuitive itemset number parameter.The paper also proposes two efficient algorithms for mining the HAUIM in the data stream for the increasing number of data stream systems.The main content of the thesis is as follows:(1)An efficient Top-k HAUIM algorithm named TKAU is proposed.Based on the utility-list structure,the TKAU algorithm converts transaction data into list structure,and obtains a list of longer itemsets through recursive crossover between lists.This algorithm can get utility directly from list,avoiding multiple scanning of database.We propose two pruning strategies,EMUP and EA,which greatly reduce search space and reduce cross operation of lists.According to the characteristics of the Top-k itemset mining problem,three minimum utility threshold raising strategies,RIU,CAD and EPBF,are designed to quickly improve the minimum utility threshold and avoid invalid search operation.(2)Two algorithms,HAUIS-list and HAUIS-pd,are proposed to mine the high averageutility itemsets in the data stream environment.HAUIS-list is a combination of TKAU and sliding window model,combined with rapid list update operations.HAUIS-pd,based on map database,utilizes transaction mapping and merge technology to continuously reduce the size of the transaction database required for scanning,and quickly measures the utility of itemsets.HAUIS-pd combines efficient pruning strategy to achieve excellent performance in time and space efficiency.
Keywords/Search Tags:Data mining, Association rule mining, Top-k high average-utility itemset mining
PDF Full Text Request
Related items