Font Size: a A A

Research Of Closed Frequent Itemsets Mining Algorithm In Data Steams

Posted on:2010-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiFull Text:PDF
GTID:2218330371950285Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
A data stream was an ordered sequence of items that arrives in timely order. Different from data in traditional static databases, data streams were continuous, unbounded, and usually came with high speed and had a data distribution that often changes with time, which makes traditional mining algorithms be difficult to deal with data stream. In addition, due to the great number of frequent itemsets in data stream, a huge number of association rules were generated, which included many redundant and useless ones. Also frequent itemsets mining required a lot of time and space utilization. Closed frequent patterns can not only ensure complete information, but also reduce the size of the frequent patterns greatly. Therefore, closed frequent itemsets mining had become one of hot research topics in data streams.In this paper, closed frequent pattern mining in data stream had been studied. Works are as follows:First, some classical mining algorithms related to frequent itemsets and closed frequent itemsets were studied. Then detailed analysis of CFI-Stream algorithm was introduced. Furthermore, some existing problems were advanced, and some improve-ments about the CFI-Stream were given as well. Finally, we proposed a new algorithm accordingly. In this algorithm, decayed window mechanism and approximate estima-tion methods were used to solve problems of locality of mining and data turbulence. So that efficiency of the mining was enhanced.Second, we advanced the algorithm of data stream closed frequent itemsets, which was called DSSCI in short. Through discussing the relationship between itemsets, DSSCI-Tree was constructed to store all the closed frequent itemsets of data stream in current sliding window, so as to save storage space. During construction period, strategy of depth-first search was adopted to ensure no repeated nodes were constructed. The mechanism of transaction as the basic unit of the sliding window was used to update DSSCI-Tree, so that closed frequent itemsets could be generated and mined online directly, by which the efficiency was improved.Finally, the testing datasets were analyzed to prove feasibility of DSSCI algorithm. Then DSSCI algorithm and Moment algorithm are compared using generated datasets. The experimental results show that the DSSCI performed better in time efficiency and space occupation.
Keywords/Search Tags:data stream mining, frequent itemsets, closed frequent itemset, algorithm
PDF Full Text Request
Related items