Font Size: a A A

The Research Of Frequent Itemsets Mining Algorithm Over Data Streams

Posted on:2010-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:H X LiFull Text:PDF
GTID:2248330395957606Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the world has gone towards the information age and data streams there have been generated in the various areas. It is the basic problem of frequent itemsets mining over data streams that how to mine frequent itemsets quickly and approximately using the limited storage space.In this paper, the problem of frequent itemsets mining over data streams have been deeply studied, and the following work has been done:(1) Based on the analysis and study of existing algorithm, the idea of variable minimal support threshold is introduced and an algorithm MRFIVMST with variable minimal support threshold is designed which is used to mine out the recent frequent itemsets over data streams. And the performances of this algorithm is verified by experiments.(2) Based on the analysis of Lossy Counting and estDec, and considering the factor of users’interest, FP-Tree structure is improved into NFI-Tree structure, the way of the count decay in estDec algorithm is improved simultaneously and an algorithm is designed that is a frequent itemsets mining algorithm FIMVES with variable count errors over data streams. This algorithm can guarantee that the supports of frequent itemsets which have different lengths are strictly controlled within a user-specified minimal support threshold, which makes the error of the support of mining result accords with the conditions:(a) the count of1-itemset has no error;(b) the maximal error of the count about2-itemset is no more than ε;(c) the maximal error of the count about k(k>2)-itemset is no more than2ε. The correctness of above conclusion is proved by theory. This algorithm is used to mine simulation datasets. The experimental results show that some frequent itemsets which interest users can be found out correctly, and the processing time and storage consumption which users are interested in are reduced as well.
Keywords/Search Tags:data stream mining, frequent itemset mining, frequent items
PDF Full Text Request
Related items