Font Size: a A A

Research On Count-based Algorithm For Mining Frequent Items Over Data Stream

Posted on:2015-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:M WuFull Text:PDF
GTID:2268330425495906Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of computer information technology,in the fields ofnetwork anomaly detection, real-time trading information, sensor monitoring et al, adata processing model called data stream processing model is proposed. Thealgorithm of frequent item in the model (frequent itemsets) over data stream is one ofthe hot research topics in the data mining techniques. Because of high speed and largeflow data arrival, algorithms can process the data only once and can not save all thedata. The design of the data mining algorithm under data stream environment is achallenging work. This paper mainly researched the frequent itemsets and frequentitemsets mining over data stream, and the main contents and innovations are asfollows:After studying the article named “the research based on the count of classicalgorithms”,the improvement method is put forwarded based on the simple datamodel. The central idea of the improved algorithm is to maintain within the sampleset data and keep the frequent item by the update operation, delete the non frequentitems in the collection and finally output data of Top-k frequent item. By aboveoperation on processing the sample set, the algorithm can avoid the problem of dataremaining with initial high frequency and have better mining accuracy.Then an improved algorithm and data mining structure based on the classic staticdata FP-tree model is proposed,which is combined with window technology to adaptto running in massive data stream. The algorithm saves generated potential frequentitem by pretreatment procedure and uses the stored data record as the input structurein NFP-tree. All frequent itemsets are produced through iterative mining finally. Inorder to reflect the differences of the data appearance time, the algorithm calculate thetime by adding weights to meet the query needs of the user for the frequent itemsetsnewly appeared. Through the simulation experiment, the improved algorithm isproved to have good performance compared with other algorithms. This method canadapt to the process of mining over data streams.
Keywords/Search Tags:data mining, data stream, Top-k, frequent items, FP-tree, Frequent itemsets
PDF Full Text Request
Related items