Font Size: a A A

Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams

Posted on:2011-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:L B WangFull Text:PDF
GTID:2178360302494609Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining is an important research area in data stream mining. There are still many problems for mining frequent itemsets in the previous algorithm. For example, patterns generation delays more serious; Mined frequent itemsets are very large; Simple algorithms of frequent itemsets mining without adopting the constraint methods is not application-oriented, it does not meet the needs of users. In response to these problems, the paper has mainly focused on how to mine frequent itemsets based on FP-Tree data structure from data streams. The solving of these problems has important meaning for e-commerce, Network Communication, Business Intelligence and so on.Firstly, a new algorithm MFCI-SW for mining frequent closed itemsets in data streams is proposed, the data items in frequent closed itemsets are collected up, and the supporting degree F and the window sequence number K of them are stored in the FCIL. Then, when a new basic window arrives, the pruning of MFCI-SW-Tree is completed by deleting the data item whose K is the least in FCIL and merging the new data items into FCIL. The proposed algorithm is efficient in enhancing mining the frequent closed itemsets.Secondly, a new algorithm MFI-TD is proposed for mining maximum frequent itemsets. A new data structure, called PW-tree (Point based Window-tree) is introduced to store each transaction for the current window, and the final node of the path which denotes a maximum frequent itemset is pointed by the DP ( domain pointer). Then according to the data structure, the MFI-TD gradually deletes the obsolete and infrequent itemset branches in PW-tree by using of time decay model and the user may obtain the maximum frequent itemsets. The proposed algorithm is better than DSM-MFI in time efficiency.Lastly, a novel sequential pattern mining algorithm oriented feature discovery of software fault based on location matrix named SPM-LM is proposed. A location matrix for each event is constructed to record the frequent sequence information, which produces the frequent 1-sequence. Then, the frequent k-sequence for the prefix to frequent 1-sequence is generated through the operation for the location matrix. The software fault sequences are matched in the tree structure,the efficiency of the fault feature improved.All our experiments are performed on the real life datasets. MFCI-SW, MFI-TD and SPM-LM are performed by the experiment.
Keywords/Search Tags:Data stream, sliding window, closed frequent itemsets, maximal frequent itemsets, location matrix
PDF Full Text Request
Related items