Font Size: a A A

Research On The Algorithm For Mining Frequent Items From Data Streams

Posted on:2013-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:Q L MeiFull Text:PDF
GTID:2248330395490813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, stream data are widely applied in many fields, such as network-traffic monitoring, financial applications, communication data management, network security monitoring and sensor networks. In these applications, it is obviously much important for making research on this new kind of stream data and related technology. Comparing with traditional static data, streaming data are continuous, unlimited, and move at high speed with dynamic data distribution over time, All of these features of stream data make it difficult to apply traditional methods on this area. Therefore, mining certain and uncertain frequent items on stream data becomes the research hotspot in the area of data mining.We studied and analyzed the existing major techniques for mining data streams, including fading factor, Hash function, sliding window and efficient algorithms on mining uncertain stream data. We proposed more efficient algorithms to solve their disadvantages. Our major works in this paper are as follows.(1) Proposed an algorithm for mining frequent stream data items using hash function and fading factorMost existing data stream frequent item mining algorithms fail to emphasize the importance of current data. By adopting the fading factor, we present a time fading model which highlights the latest data and weakens that of "older" data. We propose a new algorithm, λ-HCount, to mine the frequent items in data stream. The algorithm adopts time fading factor λ to emphasize the importance of the relatively newer data, and records the densities of the data items in the data stream in Hash tables so as to mine the frequent items in the data stream. For a given error ε and a threshold of density S, our algorithm can mine ε-approximate frequent items with a predefined probability greater than p using e/εIn(-M/Inp)+r/S-εmemory space, here M is the number of different data items. Time complexity for processing one data item isO(1). Experimental results with synthetic and real data sets show that the algorithm outperforms other methods in terms of accuracy, memory requirement, and processing speed.(2)Poposed an algorithm for mining frequent stream data items based on variable windowWe propose a new algorithm, FCW_MRFI, to mine frequent data items in stream data based on variable windows. The FCW_MRFI algorithm can mine frequent item in any window of recent streaming data, whose given length is L. Meanwhile, it divides recent streaming data into several windows of variable length according to m, which is the number of the counter array. This algorithm can achieve smaller query error in recent windows, and can minimize the maximum query error in the whole recent streaming data. In order to compare its accuracy and recall rate with other existing methods, experiments with real data sets and synthetic data sets are conducted, which proves that FCW_MRFI algorithm offers much improved accuracy in recent frequent item mining in data stream.(3) Poposed an algorithm for mining frequent items in uncertain data streamWe proposed a new algorithm, SMFUD, for mining frequent items in uncertain data stream. Based on the characteristics of uncertain data in streaming data, the algorithm adopts sliding window to mine uncertain frequent items, which dramatically reduces candidate items by pruning strategies and dynamic programming. Based on the possible instance solving rules in slide windows, the algorithm can delete "older" tuples and adds new ones at very high speed. Experimental results show that this algorithm can effectively decrease the number of candidate items, reduce space complexity and increase the speed in frequent items query.
Keywords/Search Tags:Data mining, stream data, frequent item, fading factor, Hash function, uncertain data, sliding window, possible instance
PDF Full Text Request
Related items