Font Size: a A A

Research On Data Items Frequent Itemsets Mining Algorithm Based On Sliding Window

Posted on:2019-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:F T LiFull Text:PDF
GTID:2348330566959245Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and information technology,data flow has been widely used in anomaly detection and alarming in network traffic management and transaction processing in the retail industry.Data flow analysis and mining has become a hot research issue.Among them,mining data stream frequent itemsets is one of the most basic problems in data stream mining,so frequent data itemsets mining has become the focus of attention.Frequent patterns in the data stream are associated with a particular window,so the window model is critical for frequent patterns in the data stream.The usual window model is divided into landmark windows,decay windows,and sliding windows.Because outdated transactions in the first and second models have a certain impact on the mining results.Therefore,people are most concerned with the sliding window model.The frequent itemsets mining algorithms for data streams are most significant based on the FP-Tree structure.The FP-Growth algorithm needs to recursively generate a large number of conditional FP-Trees to save itemset information in the mining process,which consumes time and space resources.The DSFPM algorithm establishes the DSFPM-Tree structure when the first basic window flows in.After the new window arrives,the critical frequent closed itemsets of the mining are inserted into the tree in sequence.When the window slides,items of the old transaction in DSFPM-Tree need to be deleted,such a frequent update tree structure,lead to relatively large time expenditure.The pWin algorithm needs to frequently use the condition pane to update the prefix tree when there is a window inflow,and frequently deletes the items in the old window when the window slides.Aiming at the deficiencies of the above algorithms,this paper proposes a frequent itemset mining algorithm of FP-Tree in the sliding window MFI-SW.The algorithm adopts the storage method of the upper triangular matrix,and establishes a critical frequent mode tree NCFP-Tree for each basic window to compress the stored data information.Under the same number of nodes,the NCFP-Tree has a lower depth and few branches than the FP-Tree,because the parent-child nodes in the NCFP-Tree only keep all the information of the critical frequent 2-item set,so the nodes that do not meet the critical frequent 2-item set are critically pruned.When the window slides,the NCFP-Tree corresponding to the old basic window is released,and the NCFP-Tree corresponding to the new basic window is created.Finally,the critical frequent itemsets mined are saved in the critical frequent itemset table FI-i_List by the hierarchical traversal method.A lot of comparison experiments were done by extracting three data sets in IBM with DSFPM algorithm and pWin algorithm.Under the same circumstances,the time efficiency of MFI-SW algorithm is slightly higher than the other two algorithms;And under the same data set,the effects of the minimum support degree threshold and sliding window change on the three algorithms are analyzed.The results show that the MFI-SW algorithm has higher time efficiency.The error factor in the algorithm is a key factor,so the algorithm's recall rate and precision rate are compared under the error factor,the experiment results show that the MFI-SW algorithm's recall rate and precision rate are slightly higher than the other two algorithms.The above experiment proves that the MFI-SW algorithm is effective and feasible.
Keywords/Search Tags:Data stream, frequent itemset, NCFP-Tree, Critical pruning
PDF Full Text Request
Related items