Font Size: a A A

Research On High Utility Pattern Mining Methods In Data Stream

Posted on:2024-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:C Y FangFull Text:PDF
GTID:2568307106967889Subject:Computer technology
Abstract/Summary:
As one of the classic research directions in the field of data mining,high-utility pattern mining is widely used in scenarios such as market retail,stock forecasting,and network attack and defense.In these scenarios,data is no longer stored in the database in a static form,but is often presented as continuous,unbounded,and high-speed streaming data.How to efficiently mine valuable patterns in this real-time data stream has brought new requirements and challenges to researchers.Aiming at the problems existing in existing algorithms such as high cost of data structure construction,low efficiency of threshold boosting and pruning strategies,and too large algorithm search space,this paper mainly does the following work:(1)Starting from the study of static data,a new algorithm FAMILY for quickly mining high average utility itemsets is proposed.First of all,in order to overcome the problem of a large number of invalid intersection operations in the iterative construction process of the traditional Utility-List,a new data structure EAU-List is proposed,which can be constructed by breadth-first traversal links,and the stored information such as average utility helps the FAMILY algorithm to explore new candidate patterns.Secondly,in order to further compress the search space of the algorithm and prune more candidate itemsets,a maximum average utility upper bound called rmau is proposed.A large number of experimental results show that the FAMILY algorithm is superior to related algorithms in terms of running time,memory consumption and scalability.At the same time,this research also lays the foundation for mining Top-k high-utility itemsets in data streams.(2)Based on the sliding window model,two algorithms Top UMS and Top UMS(opt)for mining Top-k high-utility itemsets from data streams are proposed.First,a dynamic utility storage structure u List is designed.Like EAU-List,this structure also avoids the problem of high construction cost of traditional Utility-List,and it can be dynamically updated with the flow of data flow.Secondly,in order to make the algorithm obtain a higher initial utility value and shorten the mining time,three threshold raising strategies are introduced by using the utility of all 1-itemsets,batch utility and transaction utility,and a large number of candidate itemsets are pruned,and proposed a new utility upper bound further compresses the search space of the algorithm.Based on the introduced multiple threshold raising strategies and the new utility upper bound,we further optimized the Top UMS algorithm and proposed the Top UMS(opt)algorithm.Finally,from the perspective of different k values,different window sizes,different batch sizes and scalability,the running time and memory usage of the algorithm are compared with related algorithms.The results show that the Top UMS algorithm performs better,and the optimized Top UMS(opt)algorithm also has a certain performance improvement than the Top UMS algorithm.
Keywords/Search Tags:Data stream, High average utility itemset, Top-k High utility itemset, Threshold raising strategy, Utility upper-bound
Related items