Research On Count-based Algorithm For Mining Frequent Items Over Data Stream

Posted on:2015-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:M Wu

Full Text:PDF

GTID:2268330425495906

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of computer information technology，in the fields ofnetwork anomaly detection, real-time trading information, sensor monitoring et al, adata processing model called data stream processing model is proposed. Thealgorithm of frequent item in the model (frequent itemsets) over data stream is one ofthe hot research topics in the data mining techniques. Because of high speed and largeflow data arrival, algorithms can process the data only once and can not save all thedata. The design of the data mining algorithm under data stream environment is achallenging work. This paper mainly researched the frequent itemsets and frequentitemsets mining over data stream, and the main contents and innovations are asfollows:After studying the article named “the research based on the count of classicalgorithms”,the improvement method is put forwarded based on the simple datamodel. The central idea of the improved algorithm is to maintain within the sampleset data and keep the frequent item by the update operation, delete the non frequentitems in the collection and finally output data of Top-k frequent item. By aboveoperation on processing the sample set, the algorithm can avoid the problem of dataremaining with initial high frequency and have better mining accuracy.Then an improved algorithm and data mining structure based on the classic staticdata FP-tree model is proposed，which is combined with window technology to adaptto running in massive data stream. The algorithm saves generated potential frequentitem by pretreatment procedure and uses the stored data record as the input structurein NFP-tree. All frequent itemsets are produced through iterative mining finally. Inorder to reflect the differences of the data appearance time, the algorithm calculate thetime by adding weights to meet the query needs of the user for the frequent itemsetsnewly appeared. Through the simulation experiment, the improved algorithm isproved to have good performance compared with other algorithms. This method canadapt to the process of mining over data streams.

Keywords/Search Tags:

data mining, data stream, Top-k, frequent items, FP-tree, Frequent itemsets

PDF Full Text Request

Related items

1	FP-Tree Based Mining Frequent Itemsets Over Data Streams
2	The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data
3	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
4	Research On Data Items Frequent Itemsets Mining Algorithm Based On Sliding Window
5	The Research Of Frequent Itemsets Mining Algorithm Over Data Streams
6	Research On Mining Frequent Pattern Based On The Optimized FP-Tree In Data Streams
7	Research On The Algorithm Of Data Stream Frequent Itemsets Mining
8	Research On Frequent Pattern Mining Algorithm Oriented To Data Stream
9	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application
10	Research On Algorithms For Mining Frequent Patterns In Data Streams