Nowadays,as technology advances,many applications generate large amounts of data at very high speeds,including credit fraud detection,e-commerce data,web mining,stock analysis,network intrusion detection,sensor networks and homeland security.We call this dynamically changing data dataflow,and how to mine valuable information efficiently has become a hot research topic.Traditionally,data mining was done by processing static data sets,storing data in storage devices,and enabling data mining algorithms to read data many times.When data sources come from open data streams,unlike static data,they are transient and can usually be read only once,and due to the nature of the data stream,logging can be performed at high rates of millions of data items per day Unrestricted accumulation on one site,not all data can be loaded into memory,off-line mining using fixed-size data sets is not technically feasible anymore,and any system operating on it can not control the flow of data arriving order.Therefore,mining such data requires a series of new algorithms to run continuously and seamlessly and make timely processing as the data arrives.In this paper,the advantages and limitations of the existing data mining algorithms are analyzed,the emphasis is on frequent patterns mining,and the corresponding solution algorithms are proposed according to some existing problems.In many frequent itemsets mining algorithms based on sliding window,the data in each sliding window is processed as frequent itemsets for independent data sets,ignoring the continuity of data flow and the redundant data in adjacent windows,resulting in A lot of unnecessary calculation,affecting the efficiency of mining.To solve this problem,a frequent itemsets mining algorithm based on matrix and prefix tree is proposed.The prefix tree is used to store the frequent itemsets excavated in the first sliding window.After the window slides,the principle of incomplete frequent itemsets discarding is adopted.The frequent itemsets of the current window need only be set for the items.contained in the new transaction.Mining,updating the prefix tree accordingly,and reducing the time overhead of scanning the frequent itemsets due to the structural characteristics of the prefix tree sharing the common prefix.According to the characteristics of data stream,the research problems of frequent frequent pattern mining in data stream mainly focus on the following two points:Firstly,the introduction of efficient data structure storage dataset to be able to quickly deal with the data.Second,the performance of the algorithm itself is improved.Based on this,this paper designs a parallel data stream closed frequent itemsets mining algorithm.The vertical data format is introduced to organize the data and the item set is combined and counted to judge by the set operation.Divide and conquer strategy for the initial generation,and parallelize it with the ForkJoin framework to improve the algorithm processing performance. |