Font Size: a A A

Research On Algorithms For Frequent Itemsets Mining Over Uncertain Data Streams

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:K S ZhouFull Text:PDF
GTID:2268330428468666Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent itemset mining is an important task in the data mining community. With the rapid development of computer technology, many real-life applications in the production and life field are capable of generating great amounts of data streams with high speed in real time continuously. The complexity of stream data makes frequent itemsets mining over it much challenging. Moreover, data in many data streams tends to be uncertain and its existence is often represented in the form of probability. Traditional data stream frequent itemset mining algorithms usually assume that data is precise and certain. These algorithms can’t be applied to address the problem of frequent itemset mining over uncertain data streams directly due to the introducing of uncertainty. In addition, summary data structures devised for precise data stream are no longer effective. Therefore, the research and design of efficient summary data structures and mining methods aiming at uncertain data streams become increasingly important.In this paper, the issue of frequent itemsets mining over uncertain data streams is introduced in detail, important algorithms so far in this field are discussed, and a sliding-window-based false-positive oriented algorithm is proposed. The algorithm improves the mining accuracy of the proposed algorithm by introducing maximum possible error to narrow the gap between estimated expected support and true expected support of itemsets. Besides, since the minimum support threshold can’t be properly predefined, this paper also presents a sliding-window-based Top-K frequent itemsets mining algorithm UFIMTopK on the basis of UFIM algorithm.The main work of this paper is summarized as follows,1. Causes, forms and processing models of data uncertainty are introduced in great detail, frequent itemsets mining and Top-K frequent itemsets mining algorithms for uncertain data streams so far are enumerated and the advantages and disadvantages of these algorithms are analyzed and discussed.2. In order to overcome the defects of low mining accuracy of SRUF-mine algorithm, this paper proposes a sliding-window based false-positive oriented UFIM algorithm and depicts the summary data structures and main idea of the corresponding algorithm. Moreover, the performance of UFIM and SRUF-mine are compared using synthetic datasets in terms of running time, memory consumption and mining accuracy. Experimental results show that compared to SRUF-mine, UFIM can achieve higher mining accuracy.3. In view of the issue of minimum support threshold hard to predefine properly, on the basis of the proposed UFIM algorithm, this paper also presents a sliding-window-based Top-K frequent itemsets mining algorithm UFIMTopK, describes its summary data structure, main idea and experimental analysis of its performance. Experimental results show that UFIMTopK can mine Top-K frequent itemsets from uncertain data streams effectively with reasonable running time and memory overhead.
Keywords/Search Tags:Data Mining, Uncertain Data Stream, Frequent Itemsets, Top-K frequentitemsets, Sliding Window
PDF Full Text Request
Related items