Font Size: a A A

Research On Algorithm For Mining Top-k Frequent Closed Itemsets Over Data Stream

Posted on:2013-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LvFull Text:PDF
GTID:2248330395485990Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Most information in our real life are exist in the form of a stream of data generated. Suchas monitoring information of sensor networks, network security monitoring, web users clickon streaming, weather monitoring and analysis, has a broad application background. Owing tothe characteristics of data streams such as continuity, unknown and potential infinite,traditional data mining algorithms can not be applied on data stream directly. Therefore, howto effectively mining and manage data stream has attracted the attention of a large number ofresearchers, becomes a new hot topic, while frequent itemset mining is a important part ofdata stream processing.In this paper, first give a brief introduction of data mining technology, described andanalysis the classical algorithm. To avoid setting an appropriate minimum support thresholdand to better understand the frequent itemset, proposed an algorithm to mining Top-k frequentclosed itemsets in data stream. The algorithm uses the thinking of mining frequent closedpattern section by section, to mining the k most frequent closed itemsets in sliding window.And by the length restriction of the itemset, to better meet the needs of users. Because thealgorithm does not process the itemset not meet user-specified length, is bound to cause someloss of accuracy, but can also make the speed up of mining. Users need to find a balancebetween speed and accuracy based on the actual application needs. In order to verify theeffectiveness of the algorithm, the simulation experimental results of this algorithms showthat the algorithm has a good time and space efficiency. The decline in accuracy caused byrestrictions on the length of the itemset can be controlled by user. Can complete tasks of datastream mining better.To be able to handle multi data streams and improve the overall efficiency of thealgorithm, some key processed such as polling, pre-processing of data streams, for addingnew data stream and remove the old data streams were introduced into the algorithm ofmining frequent closed pattern in basic window. Form a strategy of handling multi data stream.Finally, implement the algorithm in MapReduce programming framework. Constitute amulti-stream mining frequent itemsets prototype system. System first divide data stream intosmaller static split, and then mining frequent itemsets block by block and provided to usedqueries, to process multi data stream mining. Experimental results show that system can deal with multi data stream efficiently at the same time. And has a good performance in time andspace efficiency, system resource utilization and scalability. Can reaching the requirement oflarge-scale mining frequent itemsets applications from multi data streams.
Keywords/Search Tags:data stream, frequent itemsets, sliding window, Top-k frequent closed itemsets
PDF Full Text Request
Related items