Font Size: a A A

Research On Key Technologies Of High Utility Itemset Mining

Posted on:2018-04-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:S M GuoFull Text:PDF
GTID:1368330566498941Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the technologies of Internet,internet of things and cloud computing have grown rapidly,information technology has been incorporated into the traditional applications in politics,economy,military,scientific research and daily life areas,and massive amounts of data have been generated faster than ever before.Moreover,all over the world intelligent mobile devices,sensors,electronic business websites and social networks have produced all kinds of data at every moment.How to make a prompt and effective data analysis from massive amounts of data and extract potential and useful patterns closely related to the customs of daily life is the main concern of governments and enterprises during the information age.For example,securities regulatory commission judges whether it has insider trading in a stock or a stock price manipulator has control the market price of a stock through analyzing the deal price and quantity of buyers and sellers for the stock.The alipay company acquires the customs of consumers through analyzing their shopping lists with the alipay's payment network,and then makes their marketing strategy.Transport department draws up a restrict driving and parking policy to reduce the city traffic jam through acquiring the information of car flow in the road network at all times.Data mining refers to finding important,unknown,potential and useful patterns from databases.High utility itemset mining(HUIM)is a necessary research topic in data mining field.In this paper,we have solved seven problems in HUIM.1)We propose an efficient algorithm for mining high utility itemsets and an efficient algorithm for mining top-k high utility itemsets from transaction databases.The challenging problem existing in the current HUIM algorithms and top-k HUIM algorithms is that during the process of computing the utility of itemsets,too many candidates are produced,a great deal of memory is required and it is time-consuming to verify the candidates.Thus in this paper we propose an efficient algorithm HUITWU for mining high utility itemsets and an efficient algorithm TKHM for mining top-k high utility itemsets from transaction databases.HUITWU adopts a novel tree structure HUITWU-Tree to store the itemsets in the database and a utility database to store the utilities of itemsets in the transactions of the database.During the mining process,for each itemset generated from HUITWU-Tree,its utility in the database is calculated directly through the utility database without any candidate.TKHM adopts the same data structures in HUITWU.During the construction of data structures,TKHM exploits two threshold-raising techniques to raise the minimum utility threshold initialized as 0.During the mining process,TKHM calculates the utility of itemsets in the database directly to raise the minimum utility threshold dynamically without any candidate.Through the experimental evaluations,we can learn that the runtime and memory consumption of HUITWU and TKHM significantly outperform that of the state-of-the-art algorithms.2)We propose an efficient algorithm for mining closed+ high utility itemsets from transaction databases.The challenging problem existing in the current closed+ high utility itemset mining algorithm is that during the process of computing closed+ high utility itemsets,too many candidates are produced.Thus in this paper we propose an efficient algorithm Clo HUI for mining closed+ high utility itemsets from transaction databases without candidates.Clo HUI adopts a novel tree structure HUITWU-Tree+ to store the itemsets and their support in the database,and a utility database to store the utilities of itemsets in the transactions.During the mining process,we propose an efficient strategy to verify closed frequent itemsets,and then for each closed frequent itemset its utility in the database is calculated directly without candidate generation.Through the experimental evaluation,we can learn that the runtime of Clo HUI significantly outperforms that of the state-of-the-art algorithm.3)We propose an efficient algorithm for mining high utility itemsets incrementally and an efficient algorithm for mining high utility itemsets interactively.The challenging problem existing in the current high utility itemset incremental mining algorithms and interactive mining algorithms is that during mining process,too many candidates are produced.Thus we propose an efficient algorithm IHUI-Miner for mining high utility itemsets incrementally and an efficient algorithm FIHM for mining high utility itemsets interactively.IHUI-Miner adopts a novel tree structure IHUIL-Tree to store the itemsets from the original or updated database.During the mining process,the utility of itemsets in the database is calculated directly without candidates.FIHM constructs a tree structure HUITWU-Tree containing all the items in the database,and during the mining process there is no need to generate candidates.Through the experimental evaluations,we can learn that the runtime of IHUI-Miner and FIHM significantly outperform that of the state-of-the-art algorithms.4)We propose an efficient algorithm for mining high utility itemsets and an efficient algorithm for mining top-k high utility itemsets over data streams with sliding window techniques.The challenging problem existing in the current high utility itemset mining algorithms and top-k high utility itemset mining algorithms over a sliding window is that during the mining process too many candidates are produced.Thus in this paper we propose an efficient algorithm HUISW for mining high utility itemsets and an efficient algorithm TK-HIS for mining top-k high utility itemsets over a sliding window.HUISW adopts a novel tree structure HUITWU-Tree+ to store the itemsets from the sliding window.During the mining process,HUISW calculates the utility of itemsets in the sliding window directly without candidates.TK-HIS adopts the same data structure in HUISW,and exploits two threshold-raising strategies to raise the minimum utility threshold initialized to 0.During the mining process,the utility of itemsets in the sliding window is calculated directly to raise the minimum utility threshold dynamically without candidates.Through the experimental evaluations,we can learn that the runtime and memory consumption of HUISW and TK-HIS significantly outperform that of the state-of-the-art algorithms.
Keywords/Search Tags:high utility itemsets, utility mining, incremental mining, interactive mining, closed~+high utility itemset, pattern growth, sliding window
PDF Full Text Request
Related items