Font Size: a A A

Research On Frequent Pattern Mining Algorithms In Uncertain Data Streams

Posted on:2017-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:C Q DuFull Text:PDF
GTID:2358330482991350Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing, big data and Internet of things, the existing form of the data has changed. In many practical applications, most of the data are in stream form, such as web search log data, sensor network data, climate environment monitoring data, financial transaction data and communication recording data, and so on. Moreover, as the data collection technology continues to progress, the data collected in the data streams often shows uncertainty, that is, the emergence of data is usually accompanied by a certain probability. The traditional frequent pattern mining algorithms mainly focuses on deterministic data, including the deterministic static data and deterministic dynamic data streams. However, for uncertain data streams, frequent pattern mining algorithm is not yet mature. Besides the characteristics of infinity, rapidity and real-time of the traditional data streams, the problem of uncertainty for uncertain data streams needs to be solved urgently. This makes it difficult to use traditional mining algorithm of deterministic data stream directly on uncertain data stream, and the traditional summary data structure can not effectively store the uncertain data. Therefore, it is necessary to design the optimized summary data structures suitable for uncertain data streams, and propose efficient algorithms to mine frequent pattern on uncertain data streams.In this paper, the problems of frequent pattern mining on uncertain data streams are introduced in detail, and the advantages and disadvantages of some classical algorithms in this field are analyzed, then a new algorithm BSUF-mine for mining frequent pattern based on sliding window is proposed. The algorithm designs a summary data structure suitable for the characteristics of uncertain data streams. The structure can efficiently store the summary data information of the uncertain data streams, and improve the time and space efficiency of frequent patterns mining. Moreover, in view of the problem in the calculation of the expected support, a new algorithm WBSUF-mine for frequent pattern mining is proposed, which is based on the weight factor of the data item. The algorithm can effectively dig out the frequent patterns with longer length, and ensure the integrity of the frequent pattern mining results. The main work of this paper is as follows:(1) This paper introduces the theory of uncertain data in detail, including the production causes, the performance and the processing models. Meanwhile, analysis and discuss the advantages and disadvantages of the classical algorithm in the frequent pattern mining area, including frequent pattern mining algorithms of uncertain data, deterministic data streams, and data streams with uncertainty.(2) Given the problem that frequent pattern mining algorithm SUF-growth of the uncertain data streams has a low degree of compression summary data structure, and it will generate a large number of redundant nodes in memory, a data structure BSUF-stream with high compression degree is designed, and an efficient algorithm BSUF-mine for mining frequent pattern on uncertain data streams is proposed. The main ideas of the algorithm and the summary data structure of BSUF-stream are introduced in detail, and the experimental results are compared with the SUF-growth algorithm from three aspects of memory consumption, time consumption and scalability. The experimental results show that the BSUF-mine algorithm can effectively reduce the number of redundant nodes in memory, and obtain high memory utilization and time efficiency.(3) In view of the problem that the expected support of the item set is calculated by multiplying the probability of the data item in a simple way, without sufficient consideration of the weight factor of the data item, missing the frequent item sets with longer length, combined with the algorithm BSUF-mine, an efficient algorithm WBSUF-mine for frequent pattern mining is proposed. The main idea of the algorithm and the summary data structure used in the WBSUF-stream are introduced in detail, and the experimental results are compared with the BSUF-mine algorithm from three aspects of memory consumption, time consumption and the distribution of frequent patterns. The experimental results show that, under the premise of ensuring reasonable memory cost and running time, WBSUF-mine algorithm can effectively mining frequent patterns with longer length, and improve the integrity of frequent pattern mining results, so as to meet the actual needs of some application scenarios.
Keywords/Search Tags:data mining, frequent itemsets, uncertain data, data streams, summary data structure
PDF Full Text Request
Related items