An Research On Mining Frequent Itemsets Algorithm Based On Titled-time Window

Posted on:2011-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Xu

Full Text:PDF

GTID:2178330332460340

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In the information society, people have to process various information and data today. As the explosive increase of information, the scale of the data to be processed becomes larger and larger in many practical applications. These data continually arrive with high speed, huge volume and in time order. This data module is called data stream. Traditional frequent itemset mining is difficult to complete mining task in data stream for its characteristic of high flow speed and potentially infinite elements number. These challenges have attracted considerable attention of mammy people to do a large number of researches on frequent itemset mining. Now frequent itemset mining has become one of the most heated topics in the field of data stream.FP-stream is an online algorithm which can mine frequent itemset at multiple time granularities. As a classic mining algorithm, it has good performance on time. But the deficiency in the algorithm is as follows : first, fp-growth algorithm is used to generate frequent itemsets and their support count, which will consume a great deal of main memory and run time; second, in the whole mining process ,all historical data is stored in main memory and the need of memory will exponentially increase as time going on. So, the biggest deficiency in the algorithm is huge cost of main memory.According to the above-mentioned problem, in this paper a new data structure(LR-Trie tree and tree node) is used to store frequent itemsets and their tilted-time window table based on original algorithm. At meantime, the approach of vertical bit vector expression is introduced to store transaction data for improving efficiency of time and memory. Due to new structure of tree node, data querying and storing in sequence are completed conveniently. In addition, LR-Trie tree is divided into some different files stored in disk and the index list for the item and the file is created. We can import the file according need so that the cost of main memory is substantially reduced. As research shows that the improved algorithm can attain higher storage utilization at no cost in much more execution time. So, the algorithm is working well for the case that the system has high demand on main memory but do not on time.

Keywords/Search Tags:

data stream, frequent itemset, vertical bit vector, LR-Trie tree, main memory cost

PDF Full Text Request

Related items

1	A Method Based On The Vertical Division Of Data Stream Frequent Itemset Mining Algorithm
2	Research On Frequent Itemsets Mining Algorithm In Data Stream
3	Research Of An Algorithm For Frequent Closed Itemset Mining On Data Stream
4	Research On The Algorithm Of Data Stream Frequent Itemsets Mining
5	Research On Mining Frequent Itemsets Over Data Stream
6	Research On Data Items Frequent Itemsets Mining Algorithm Based On Sliding Window
7	Paralleled Frequent Itemset Mining And Research On Data Stream
8	Research And Implementation Of Frequent Itemset Mining Over Data Stream
9	The Research On The Algorithm About Online Mining Closed Frequent Itemsets Over Data Stream
10	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data