Font Size: a A A

Research And Implementation On Frequent Itemsets Mining Algorithms In Uncertain Data Streams Environment

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y SongFull Text:PDF
GTID:2428330542454600Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining is an important work in the field of data mining,it is the basis of association rule mining,clustering and classification.With the rapid development of information technology,many applications in the real world can produce large amounts of data streams in real time,continuously and quickly.Due to its own characteristics,frequent itemsets mining on data streams is facing huge challenges.In addition,because of subjective or objective reasons,a lot of data become uncertain in reality and they are usually in the form of probability to show their existence uncertainty.The introduction of uncertainty makes it hard for the traditional algorithm directly to apply to frequent itemsets mining on uncertain data stream,and the design of the summary data structure of certain data streams is no longer effective.Therefore,the research and design of efficient summary data structure and algorithm of frequent itemsets mining on uncertain data stream become very important.Based on this,in view of frequent itemsets problem under the environment of uncertain data stream,this paper carried out the following research work:1.the knowledge related to data mining is learned and the background and characteristics of the data streams are understood as well.In addition,the core idea of the current mining frequent itemsets algorithm in uncertain data streams is studied and mastered.2.Aiming at the low compression ratio of SUF-Tree which existed in the classic SUF-growth algorithm of frequent itemsets mining of uncertain data stream is too low,causing the mining algorithm of space-time consumption too much,this paper proposes a new calculate method of support-suffix support,and put forward the Tree structure P-Tree which can store summary information and the corresponding mining algorithm based on this.Finally,this article compares the consumption with SUF-growth algorithm and UDS-FIM algorithm by changing the minimum support,window size and the group size.The experimental results show that P-growth is superior to the other two algorithms in terms of space-time consumption.3.In allusion to the problem of minimum support is difficult to set reasonable in advance,combining with P-growth algorithm,this paper proposes a Top-K frequent M itemsets mining algorithm which based on the model of sliding window,namely TOPPT-growth algorithm.In order to improve the efficiency of mining algorithm,the paper provides two optimization strategies:(1)dynamic ascending mining threshold(2)adapt to ascend the pruning threshold to improve the accuracy and time efficiency of the algorithm.At last,the performance of the algorithm is analyzed in the experiment and the results show that under the premise of running time and memory overhead is reasonable,the TOPPT-growth algorithm can effective excavate Top-K frequent M itemsets in uncertain data stream.
Keywords/Search Tags:Data Mining, Frequent Itemsets, Uncertain Data Streams, Top-K Frequent Itemsets, Sliding Window
PDF Full Text Request
Related items