Font Size: a A A

Research On Mining Frequent Closed Itemsets From A Sliding Window Over Data Streams

Posted on:2009-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2178360272455179Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, especially the emerging of the network technology, our abilities to collect, store and transfer data have been improved dramatically. Comparing to the explosive growth of data, our needs for decision relevant knowledge are not satisfied yet. Knowledge discovery and data mining technology is an important approach to address this problem. Frequent pattern mining is a basic problem of data mining, including mining transactions, sequences, trees and graphs. The algorithm for it has been prevalently used in many other data mining task, such as association analysis, period's analysis, maximal and closed patterns, query, classification and index technology etc. Since it lays groundwork for other problem and its intrinsic complexity, the algorithm for frequent pattern miming has become the focus of many research workers.Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. This paper proposes an incremental maintaining algorithm which based on Moment algorithm. Firstly, algorithm incremental updates in data streams via a sliding window, an effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. Then, paper designed a data structure New-CFIT for data streams which are huge and rapidly coming, New-CFIT tree can be incrementally updated and the frequent closed itemsets in a sliding window can be rapidly found. Secondly, a hash-based technique was adopted to reduce the size of closed itemsets. Finally, in order to check the result's accuracy, an compression-tree is used for checking, The purpose is perform two kinds of closure checking: (1) superset checking, which checks if this new frequent itemset is a superset of some already found closed itemsets with the same support, and (2) subset checking, which checks whether the newly found itemset is a subset of an already found closed itemset with the same support.Experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithm moment for mining closed frequent itemsets over recent data streams.
Keywords/Search Tags:data stream, closed frequent itemsets, data stream mining, sliding window, moment algorithm
PDF Full Text Request
Related items