Font Size: a A A

Research And Application Of Frequent-pattern Mining Methods In Data Stream

Posted on:2009-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhuangFull Text:PDF
GTID:2178360242494597Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining technology has been maturating through over ten year's research. As a basic task of data mining, frequent pattern mining has been a hot spot all the way and been researched widespread and profound. However, a new data stream model arises along with the development of network, telecommunication and sensors technology. Data stream is a sequence of elements, huge amount, continuous, infinite and fast changing. As a rule, storing the whole data stream is unfeasible, and it is impossible or too expensive to access again once the data is processed. To discover knowledge or patterns from data streams, it is necessary to develop single-scan, on-line, multilayer and multidimensional methods of stream processing and analyzing.Data stream mining has been a focused theme in the field of data mining for the last years, and abundant applications have promoted the algorithms of data stream mining proposed one after another. Frequent pattern mining is an important subject of data stream mining, and it has been widely used in solving the problems of association rule discovering, iceberg query, classification, cluster and so on. For the peculiarity of data stream, frequent pattern mining in data stream can only scan the data one time, therefore, higher demands are put forward to the algorithms of frequent-pattern mining. Most existing algorithms are restricted because of multi-scan of the whole data set. It is different from traditional frequent-pattern mining that mining frequent patterns in data stream ordinarily allows a small enough error in the result fitting in with the infiniteness of stream and improving the efficiency of mining.After summarized the current studies of frequent-pattern mining in data stream in this thesis, two classic algorithms, Lossy Counting and FP-stream, are analyzed, which can all mine the complete itemsets of frequent patterns but have different characteristics. The Lossy Counting algorithm is space efficient because its candidate itemsets are stored in auxiliary memory. But it can only count the frequent-patterns of the whole stream, which can not distinguish the effect of data in history. Algorithm FP-stream has a good dynamic characteristic because of using tilted-time window to maintain the frequency of each pattern in multiple time granularities. However, it uses amount of memory in lower support.On the basis of analyzing of existing algorithms, a TWCT tree structure is proposed with tilted-time window framework integrated, which can maintains the complete set of frequent patterns at multiple time granularities. And we design the sequential update and delete algorithms for the structure, which makes it can be saved in auxiliary storage in order to reduce the algorithms'requirements of the main memory effectively. Take advantage of this characteristic, TWCT-Stream, a frequent pattern mining algorithm in data stream, is proposed. And its pattern growth algorithm, TWCT-Growth, generates frequent patterns in lexical order suitable for the sequential updating of TWCT structure. Experiments have proved its memory requirements lower than the same kind of algorithms like FP-stream and so on.We propose VDT-SW structure based on sliding window model, which has characteristics of horizontal and vertical data format. This structure is convenient for updating data in sliding window, and it can integrate with various kinds of frequent pattern mining algorithms. And a lower time complexity algorithm of fast itemset count query, named VDT-SW-Query, is proposed. Applying frequent pattern mining in data stream based on VDT-SW structure in the field of telecommunication data analyzing, a data stream processing system model is designed, which can offer valuable information for operations and maintenance of communication network through the mining of complaint records.Mining frequent patterns in data stream has been one of the focuses of data mining. The thesis makes a prospect of further studies of previous work.
Keywords/Search Tags:data mining, data stream mining, frequent pattern, tilted-time window, sliding window
PDF Full Text Request
Related items