Font Size: a A A

The Research On The Algorithm About Online Mining Closed Frequent Itemsets Over Data Stream

Posted on:2013-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:P Y WangFull Text:PDF
GTID:2248330377958407Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years, along with the high-speed development of the computer memory andNetwork communicational technology,the data stream appeared gradually in the daily lifeeach domain, for instance large-scale supermarket vending record, ambient temperatureexamination data, exchange stock price information and so on. The people need to carry onreal-time, the continual collection and the analysis to the magnanimous dynamic data, andthen online mining frequent pattern over data stream obtains more and more attention.Compares with the traditional static state database, the data stream has the characteristicwhich the incessancy, the high speed movement, infinite arrive. Data renews unceasinglyalong with the passage of time, but the user only will usually pay attention to the valuablepattern of the near future.What this article studies is a principal aspect of mining frequent itemset over data stream:mining closed frequent itemset. It was proposed to aim in mining frequent itemset over datastream to obtain the massive redundancy frequent itemsets, creates memory excessively manyconsumptions and the mining speed enormous drop. The closed frequent itemset includes allfrequent itemsets’ complete collection which mined, thus avoids the redundancy frequentitemsets production, may save the storage space greatly, raises the mining efficiency, but willnot lose any useful information.The data stream fast infinite characteristic and application domain’s unceasing increasing,enables online mining technology over data stream to have the challenging more and more.This article proposes one kind of new mining algorithm called CMNL-SW (Closed Map andNum List-Sliding Window), it continues to use the sliding window technology in the Momentalgorithm and the method that only maintains the closed itemset information in theCFI-Stream algorithm, but what is different with it, the CMNL-SW algorithm does not needto product transaction’s subset, also does not need to search the superset of each subset. Onconcrete algorithm use data structure Closed Map to store the closed itemset mined and NumList save all different items’ serial numbers, Via the simple union of item number containedwithin a new arriving or an old deleting transaction and intersection with certain previousclosed itemsets once, it incrementally updates the current sliding window that makes theclosed frequent itemsets can be output in real time based on any user’s specified thresholds.Theoretical analysis and the experimental results of the real datasets Mushroom、 Retail-chain and artificially synthesized dataset T40I10D100K show that the proposedmethod is superior to that of state-of-the-art algorithm Moment and CFI-Stream in terms oftime and space efficiency, and has good scalability as the number of transactions processedincreases and adapts very rapidly to the change in data streams.
Keywords/Search Tags:frequent itemset, closed frequent itemset, data stream mining, sliding window, online mining
PDF Full Text Request
Related items