Font Size: a A A

The Research And Implementation Of Mining Frequent Itemsets Algorithm Over Streaming Data

Posted on:2020-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2428330590971733Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The data stream is real-time,high-speed,and infinite.Mining frequent itemsets on streaming data has been widely studied and utilized in many fields,such as satellite monitoring data analysis,web click stream analysis and weather meteorological data analysis.Compared with static data,streaming data has more valuable information.Nonetheless,data items of streaming data are dynamically changed along with time,which brings a new challenge to mining frequent itemsets.In addition,the overwhelming majority of existing work concerning the mining frequent itemsets focus on static data,such that these work might not be effectively utilized on streaming data.Further,the number of the maximal frequent itemsets is relatively small and all the frequent itemsets are already included,so mining the maximal frequent itemsets on streaming data has good space-time efficiency and has guiding significance for mining association rule.To address these issues,the main research work of this thesis is as follows:Firstly,this thesis proposes an algorithm for mining frequent itemsets over streaming data based on nested window model,named NWFI.This thesis first gives the size of the external sliding window,and maps all the data items to the timeline when the streaming data arrives,then a time-decay model has been used to reduce the weight of historical transactions,and thus distinguish between recent transaction data and historical transactions data.In addition,the expected window of each data item has been considered to determine the size of the inner mining window,and finally mine the frequent itemsets.The experimental results show that compared with the previous work,the NWFI algorithm can better adapt to the time-varying feature of streaming data,and significantly improve performance by reducing the transaction data used for mining.Secondly,in the nested window model,the calculated window only contains the recent frequent itemsets,and there is no need to further pruning the expired data.Therefore,a novel structure named NW-tree is designed,which can quickly delete and update data by manipulating queue information of tail nodes.On this basis,this thesis proposes a novel and efficient algorithm based on the nested window model,named NW-MFI,to extract the maximal frequent itemsets in the streaming data.The algorithm maintains a tail-node queue and a prefix tree which contains the relative information oftransaction data in memory.When mining the maximal frequent itemsets,starting from the tail nodes,the maximum frequent itemsets are mined from bottom to top.Experimental results show that the NW-MFI algorithm has better space-time efficiency.The research shows that the maximal frequent itemsets mining algorithm based on nested window model can not only adapt to the time-varying feature of streaming data,but also effectively improve the efficiency of mining frequent itemsets.It has important theoretical and practical significance.
Keywords/Search Tags:Data Stream, Frequent Itemsets, Nested Window Model, Maximal Frequent Itemsets, NW-tree
PDF Full Text Request
Related items