Font Size: a A A

Research Of An Algorithm For Frequent Closed Itemset Mining On Data Stream

Posted on:2010-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:P XiFull Text:PDF
GTID:2178360272980293Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With rapid development of information technology, a kind of infinite sequence data is generated in many areas. It is strictly ordered in the dimension of time, and it is constantly changed in value, so a model called data stream is introduced. Frequent itemset mining in data stream is emerging in the research of data stream mining, but the number of frequent itemsets and association rules mined is often staggering, and they are difficult to understand and use. A more advanced mining technology called frequent closed itemset mining is emerging.CFI-Stream is an online mining algorithm for recent frequent closed itemsets; it remains two serious problems. First, there is a big performance bottleneck - Add function was recursively called, and the depth and length of the calling is exponentially increased with the length of the transaction, which greatly affects the time and space complexity. Second, CFI-Stream traverses the whole tree when it checks the closure of the largest and its subsets, resulting in many unnecessary checks and impact on time efficiency.According to the two problems above, this thesis presents a new algorithm using an ordered lexicographical tree with difference set nodes. It uses divide and conquer strategy, which mines each branch independently. When it's transferred from the end of one branch to another, it will choose the right subset as arguments of recursive calls according to the different prefix of the two branches, greatly reducing the number and depth of recursion. It combines breadth-first search and depth-first search. The depth-first search strategy guarantees that it computes the intersection meanwhile it remembers the subset for recursion. This subset is shortened to a large extent because it omits all the prefix from the root node, only reserving the difference set. Experiments show that the new algorithm reduces the time and space complexity, especially in the sparse dataset environment.
Keywords/Search Tags:data stream, online mining, frequent closed itemset, ordered lexicographical tree, difference set node
PDF Full Text Request
Related items