Font Size: a A A

Research And Application Of Mining Frequent Items Algorithm In Data Streams

Posted on:2008-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:T C WangFull Text:PDF
GTID:2178360278953482Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A data stream is an ordered sequence of items that arrives in timely order. Recently, the application of data stream mining is widely-used. It is challenge and hot to maintain frequent items over a dynamical data stream. Data stream frequent items mining is an important component of data stream mining.Research results of data stream mining algorithms mostly include hash algorithms and sampling algorithms. Firstly, these two types of classical algorithm for the main idea are discussed. Focus in this article is on the difference of frequent error bound,space requirement and time for per-item of algorithms. Secondly, EC algorithm of maintaining data stream frequent items is researched and it is discussed. Frequent error bound,space requirement and time for per-item of EC algorithm is better now, but the worst-case complexity is not considered and the frequency error bound of the results can be small. Thirdly, this paper proposed an improved algorithm for mining frequent item based on counter and localized principle. On the one hand, the method how to update the sample set of EC algorithm ischanged by a counter. The worst-case complexity is O(ε-1) by it. On the other hand, if a item is arrived, it is probably arrived recently. So, the synopsis information is saved with history sample set. The frequent error bound is lower by dynamically update sample set and history sample set. It takes certain space to get much better accuracy. Theoretical analysis and experimental results show that the algorithm proposed in this paper is better, because of smaller frequent error bound and lower worst-time complexity. Lastly, the improved algorithm is realized in practical application on frequent items mining project of Public Security Bureau. When web pages are clicked, the continuous and huge clickstream data is arrived. It is impossible to save the whole data stream to disk. When data stream is arrived, firstly, data stream processing module responsible for receiving data and traffic control. Then, frequent items mining module responsible for efficiently processing data stream and saving the useful information to synopsis data structure. The interesting pages are rapidly returned by selecting synopsis data structure, when a query request is arrived.
Keywords/Search Tags:Data Stream, Data Mining, ε-Approximate, Frequent Item
PDF Full Text Request
Related items