Font Size: a A A

Research On Topk High Expected Weight-based Itemsets Mining With Uncertain Datasets

Posted on:2015-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:M F WuFull Text:PDF
GTID:2298330467486260Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapidly developing of science and technology, a mass of data appears in different application areas. How to extract meaningful information from data and utilize the information effectively has been the focus of scientific research. Based on this practical requirement, the data mining technology arises at the historic moment. Association rule mining, as an important branch of data mining, has arose much attention and thinking of researchers. To catch the hidden connections between itemsets is the purpose of association rules mining, which has a great guiding significance for decision.Frequent itemsets mining is the key step to generate association rules. The research on frequent itemsets mining mainly spreads from two aspects:the application extension and the algorithm improvements. Around the former aspect, the maximum frequent itemsets, high utility itemsets, probabilistic frequent itemsets, etc.; the latter focus on the time and space improvement of frequent itemsets mining algorithms.This paper focuses on frequent itemsets mining, from traditional data to the uncertain data and data stream, some classic mining algorithms and their corresponding improved methods have been reviewed in detail. With a deep understanding of these scientific research achievements, I find that the probabilistic frequent itemsets mining only takes the item probability into consideration, and ignores the different importance of items. As a result, some itemsets with low occurrence frequency, but containing important items, would be lost, which lead to the loss of meaningful information. In addition, considering the difficulties of threshold selection, the Topk mining of High Expected Weight-based Itemsets (HEWIs for short) is proposed based on the probabilistic frequent itemsets, which is a new extension of frequent itemsets mining. The specific contents of this paper are as follows:(1) Referring to uncertain data mining, the "Topk HEWIs mining", together with its meaning, is first defined in this paper. Then based on MBP and UF-Growth, the classical algorithms for probabilistic frequent itemsets mining, TKWMB and TKWUG for Topk HEWIs mining are proposed, respectively. These two new algorithms represent two directions of frequent itemsets mining algorithms, the pattern growth algorithm and the level progressive. After running on different data sets, the performance of these two new algorithms is analyzed. With a comparison of efficiency, the conclusions are:TKWUG algorithm is more stable on different kinds of datasets, and more efficient on sparse dataset, and its running time is proportionally changed with k, TKWMB is more intense with k being changed, and easy to run out of memory on sparse dataset, although has a high running speed on sparse dataset.(2)Considering the trend of data stream in recent years, TWUS, an extension algorithm of TKWUG, is proposed to realize the Topk HEWIs mining from data stream. Taking the characteristics of data stream (single, unidirectional and infinite) into consideration, TWUS is proposed based on the sliding window technology and the combination of TKWUG and CPS Tree.The implementation of TWUS is provided in chapter four. TWUS stores the data of current window to WUS Tree and maintains the tree while data is flowing by incrementally updating the tree and its corresponding head table. The algorithm adopts partial updating and delayed processing to realize the Topk HEWIs mining on datastream, and responses to the users’ mining request effectively and efficiently.
Keywords/Search Tags:Frequent itemsets, Weight, Topk, Uncertain data, Data stream
PDF Full Text Request
Related items