Font Size: a A A

Research On Mining Frequent Pattern Based On The Optimized FP-Tree In Data Streams

Posted on:2011-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:H L HeFull Text:PDF
GTID:2178360302994425Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, real-world and engineering applications often generate huge amount of data streams. Data stream has many characteristics which are different from the static data. How to mine frequent patterns over data stream with high-speed and high-efficiency is a significant problem. In this paper, the key point of research is the designing of frequent pattern algorithm which is based on the optimized FP-tree. The research results have a significant effect in these fields: electrical businesses, trade aptitude and decision-making.Firstly, in order to improve the mining efficiency of frequent patterns in data streams, an algorithm DS-FPM for mining frequent patterns in data streams is presented. A data structure DSFP-tree is constructed to store the potential frequent itemsets over the data stream. In order to make the mining result preserving the historical details and saving space, the fading factor is introducted to control the memory space of the algorithm within a certain scale. First, the algorithm DS-FPM, in which data streams are partitioned, samples from DSFP-tree generated by the earlier segment using fading factorλand gets potential frequent itemsets from the latest segment, then it inserts both of them into the new DSFP-tree, finally, it obtains frequent itemsets from DSFP-tree.Secondly, in order to reduce the memory space which is needed to store itemsets on data streams, a structure DSMFI_tree of storing potential maximal frequent itemsets is constructed. An algorithm DSMFI-Miner for mining maximal frequent itemsets on data streams is presented based on DSMFI_tree. The data stream is divided into a set of segments. DSMFI_tree is updated dynamically on each segment. The maximal frequent itemsets on the data stream can be rapidly found by searching DSMFI_tree.Lastly, DS-FPM and DSMFI-Miner algorithms are implemented with language of C. All of our experiments are performed on the datasets generated by IBM test data generator.
Keywords/Search Tags:Data stream, Frequent patterns, Summary data structures, FP-tree, Fading factor, Maximal frequent itemsets
PDF Full Text Request
Related items