Font Size: a A A

FP-Tree Based Mining Frequent Itemsets Over Data Streams

Posted on:2010-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:C HuoFull Text:PDF
GTID:2178360302959185Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
By analyzing data streams frequent itemsets mining situation of foreign and domain, we aware that there are many problems in the previous algorithms for mining frequent itemsets. Simple algorithms of frequent itemsets mining without adopting the constraint methods may generate very huge frequent itemsets. Only inserting the incoming data streams is considered in most incremental mining algorithms. However, few algorithms refer to the situation that when the new data streams are inserted into and the expired data streams are deleted from current sliding window. Pure frequent itemsets mining algorithm often ignores the temporal character of the result. The paper has mainly focused on how to mine frequent itemsets based on FP-Tree data structure from data streams. The solving of these problems has important meaning for e-commerce, Business Intelligence, Market decision-making and so on.Firstly, an efficient closed frequent itemsets mining algorithm HCFI is presented. This algorithm based on sliding window model limits expired data on the impact of mining results. Closed itemsets growing strategy is adopted in this algorithm to mining closed itemsets from current sliding window. This technique strengthens the real-time character of the result. The performance of the algorithm is better than CFI-Stream.Secondly, a maximal frequent itemsets mining algorithm SW-MFI is proposed. The algorithm acquires all the frequent itemsets without the frequency of the frequent itemsets. The algorithm gains better performance than DSM-MFI.Lastly, one constrained frequent itemsets mining algorithm ConFI-SW is proposed. There are two parts in the procedure of constrained frequent itemsets mining in sliding window. They are sliding window maintenance and constrained frequent itemsets mining. The algorithm ConFI-SW gains better performance both in time and space the other methods.We implement the above three algorithms with language of C. All of our experiments are performed on the real life datasets BMS-WebView-1 and BMS-WebView-2 to execute mining closed frequent itemsets and maximal frequent itemsets from sliding window over data streams. The experimental results show the feasibility and effectiveness of our algorithms.
Keywords/Search Tags:Data stream, FP-Tree, sliding window, closed frequent itemsets, maximal frequent itemsets, constrained-based frequent itemsets
PDF Full Text Request
Related items