In the information society, people have to process various information and data today. As the explosive increase of information, the scale of the data to be processed becomes larger and larger in many practical applications. These data continually arrive with high speed, huge volume and in time order. This data module is called data stream. Traditional frequent itemset mining is difficult to complete mining task in data stream for its characteristic of high flow speed and potentially infinite elements number. These challenges have attracted considerable attention of mammy people to do a large number of researches on frequent itemset mining. Now frequent itemset mining has become one of the most heated topics in the field of data stream.FP-stream is an online algorithm which can mine frequent itemset at multiple time granularities. As a classic mining algorithm, it has good performance on time. But the deficiency in the algorithm is as follows : first, fp-growth algorithm is used to generate frequent itemsets and their support count, which will consume a great deal of main memory and run time; second, in the whole mining process ,all historical data is stored in main memory and the need of memory will exponentially increase as time going on. So, the biggest deficiency in the algorithm is huge cost of main memory.According to the above-mentioned problem, in this paper a new data structure(LR-Trie tree and tree node) is used to store frequent itemsets and their tilted-time window table based on original algorithm. At meantime, the approach of vertical bit vector expression is introduced to store transaction data for improving efficiency of time and memory. Due to new structure of tree node, data querying and storing in sequence are completed conveniently. In addition, LR-Trie tree is divided into some different files stored in disk and the index list for the item and the file is created. We can import the file according need so that the cost of main memory is substantially reduced. As research shows that the improved algorithm can attain higher storage utilization at no cost in much more execution time. So, the algorithm is working well for the case that the system has high demand on main memory but do not on time. |