Research Of An Algorithm For Frequent Closed Itemset Mining On Data Stream

Posted on:2010-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:P Xi

Full Text:PDF

GTID:2178360272980293

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With rapid development of information technology, a kind of infinite sequence data is generated in many areas. It is strictly ordered in the dimension of time, and it is constantly changed in value, so a model called data stream is introduced. Frequent itemset mining in data stream is emerging in the research of data stream mining, but the number of frequent itemsets and association rules mined is often staggering, and they are difficult to understand and use. A more advanced mining technology called frequent closed itemset mining is emerging.CFI-Stream is an online mining algorithm for recent frequent closed itemsets; it remains two serious problems. First, there is a big performance bottleneck - Add function was recursively called, and the depth and length of the calling is exponentially increased with the length of the transaction, which greatly affects the time and space complexity. Second, CFI-Stream traverses the whole tree when it checks the closure of the largest and its subsets, resulting in many unnecessary checks and impact on time efficiency.According to the two problems above, this thesis presents a new algorithm using an ordered lexicographical tree with difference set nodes. It uses divide and conquer strategy, which mines each branch independently. When it's transferred from the end of one branch to another, it will choose the right subset as arguments of recursive calls according to the different prefix of the two branches, greatly reducing the number and depth of recursion. It combines breadth-first search and depth-first search. The depth-first search strategy guarantees that it computes the intersection meanwhile it remembers the subset for recursion. This subset is shortened to a large extent because it omits all the prefix from the root node, only reserving the difference set. Experiments show that the new algorithm reduces the time and space complexity, especially in the sparse dataset environment.

Keywords/Search Tags:

data stream, online mining, frequent closed itemset, ordered lexicographical tree, difference set node

PDF Full Text Request

Related items

1	The Research On The Algorithm About Online Mining Closed Frequent Itemsets Over Data Stream
2	Research On Mining Frequent Itemsets Over Data Stream
3	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
4	Research On The Algorithm Of Data Stream Frequent Itemsets Mining
5	Research On Frequent Itemsets Mining Algorithm In Data Stream
6	Research And Realization Of Parallel Algorithm For Mining Frequent Closed Itemsets
7	Research Of Closed Frequent Itemsets Mining Algorithm In Data Steams
8	Research On Mining Frequent Itemsets Algorithm Based On Bittable
9	Study On Mining Closed Frequent Itemset Based On Hadoop
10	An Algorithm For Mining Frequent Itemsets From Data Streams