Font Size: a A A

An Research On Mining Frequent Itemsets Algorithm Based On Titled-time Window

Posted on:2011-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y H XuFull Text:PDF
GTID:2178330332460340Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the information society, people have to process various information and data today. As the explosive increase of information, the scale of the data to be processed becomes larger and larger in many practical applications. These data continually arrive with high speed, huge volume and in time order. This data module is called data stream. Traditional frequent itemset mining is difficult to complete mining task in data stream for its characteristic of high flow speed and potentially infinite elements number. These challenges have attracted considerable attention of mammy people to do a large number of researches on frequent itemset mining. Now frequent itemset mining has become one of the most heated topics in the field of data stream.FP-stream is an online algorithm which can mine frequent itemset at multiple time granularities. As a classic mining algorithm, it has good performance on time. But the deficiency in the algorithm is as follows : first, fp-growth algorithm is used to generate frequent itemsets and their support count, which will consume a great deal of main memory and run time; second, in the whole mining process ,all historical data is stored in main memory and the need of memory will exponentially increase as time going on. So, the biggest deficiency in the algorithm is huge cost of main memory.According to the above-mentioned problem, in this paper a new data structure(LR-Trie tree and tree node) is used to store frequent itemsets and their tilted-time window table based on original algorithm. At meantime, the approach of vertical bit vector expression is introduced to store transaction data for improving efficiency of time and memory. Due to new structure of tree node, data querying and storing in sequence are completed conveniently. In addition, LR-Trie tree is divided into some different files stored in disk and the index list for the item and the file is created. We can import the file according need so that the cost of main memory is substantially reduced. As research shows that the improved algorithm can attain higher storage utilization at no cost in much more execution time. So, the algorithm is working well for the case that the system has high demand on main memory but do not on time.
Keywords/Search Tags:data stream, frequent itemset, vertical bit vector, LR-Trie tree, main memory cost
PDF Full Text Request
Related items