Research And Application Of Mining Frequent Items Algorithm In Data Streams

Posted on:2008-09-26

Degree:Master

Type:Thesis

Country:China

Candidate:T C Wang

Full Text:PDF

GTID:2178360278953482

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

A data stream is an ordered sequence of items that arrives in timely order. Recently, the application of data stream mining is widely-used. It is challenge and hot to maintain frequent items over a dynamical data stream. Data stream frequent items mining is an important component of data stream mining.Research results of data stream mining algorithms mostly include hash algorithms and sampling algorithms. Firstly, these two types of classical algorithm for the main idea are discussed. Focus in this article is on the difference of frequent error bound,space requirement and time for per-item of algorithms. Secondly, EC algorithm of maintaining data stream frequent items is researched and it is discussed. Frequent error bound,space requirement and time for per-item of EC algorithm is better now, but the worst-case complexity is not considered and the frequency error bound of the results can be small. Thirdly, this paper proposed an improved algorithm for mining frequent item based on counter and localized principle. On the one hand, the method how to update the sample set of EC algorithm ischanged by a counter. The worst-case complexity is O(ε^-1) by it. On the other hand, if a item is arrived, it is probably arrived recently. So, the synopsis information is saved with history sample set. The frequent error bound is lower by dynamically update sample set and history sample set. It takes certain space to get much better accuracy. Theoretical analysis and experimental results show that the algorithm proposed in this paper is better, because of smaller frequent error bound and lower worst-time complexity. Lastly, the improved algorithm is realized in practical application on frequent items mining project of Public Security Bureau. When web pages are clicked, the continuous and huge clickstream data is arrived. It is impossible to save the whole data stream to disk. When data stream is arrived, firstly, data stream processing module responsible for receiving data and traffic control. Then, frequent items mining module responsible for efficiently processing data stream and saving the useful information to synopsis data structure. The interesting pages are rapidly returned by selecting synopsis data structure, when a query request is arrived.

Keywords/Search Tags:

Data Stream, Data Mining, ε-Approximate, Frequent Item

PDF Full Text Request

Related items

1	Research On And Implementation Of Frequent Item Set Mining System In Data Stream
2	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
3	Research On The Algorithm For Mining Frequent Items From Data Streams
4	Research On Frequent Item Mining And Correlation Analysis In Data Streams
5	Mining Frequent Itemsets Over Recent Data Stream
6	Research On Multi-stream Frequent Item Set Mining Algorithm
7	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
8	Research Of Webpage Hot Topic Retriving Technology Based On Data Stream Mining
9	Frequent Itemsets Mining Algorithm And Its Application In Data Flow
10	Research On The Mining Frequent Itemsets Over Data Stream