Font Size: a A A

The Research On Distributed Incremental Update Method Of Frequent Item Set Based On Flink

Posted on:2022-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y J SongFull Text:PDF
GTID:2518306731479854Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the present world,with the convergence of data mining and other industries and the mushrooming growth of data quantity,the user through the analysis of available information in mass data can be targeted to the next step development strategy,in order to help users to quickly and accurately analyze the valuable information in huge amounts of data,data mining technology research has been more and more attention.Frequent Itemset Mining(FIM)is one of the most important technologies in the field of data Mining.Users can analyze useful information in data through Frequent Itemset Mining.However,the traditional full-volume Frequent Itemset Mining technology is difficult to face the massive data with rapid changes.In today's world,the large amount of data generated will make the database change dynamically.The traditional full amount frequent item set mining technology will produce a lot of redundant calculation,poor adaptability and low mining efficiency.In view of the above problems,the main research of this paper is as follows:In view of the difficulty of the traditional full-volume frequent itemset mining algorithm to adapt to the rapidly changing dynamic database,this paper proposes an incremental update model INCMU based on Flink.This model can reuse the frequent itemsets existing in the database before update and perform mining only on the incremental part of the newly added transaction dataset.This paper also theoretically proves the accuracy of the calculation results of the model to ensure that the model can provide a powerful basis for users to make correct decisions.This paper proposes an incremental update method to reuse the existing frequent itemset mining results.This method calculates the global frequent itemsets in the updated database by means of set operation and query item set support.In particular,this method preserves the timeliness of frequent itemsets in a dynamic database by updating itemset support in a timely manner to remove items that are no longer frequent.The incremental update model in this paper has some generality,and the conditions that the frequent itemset mining algorithm suitable for the model needs to meet are given.In this paper,the parallel Apriori algorithm is combined with the incremental update model,and the implementation process is introduced in detail with this as an example.The experimental results prove that our incremental update model has significant advantages contrasted with full-volume mining,and the performance optimization rate is higher when the famous Web Docs dataset is used for verification.
Keywords/Search Tags:Frequent Itemset Mining, Dynamic database, Incremental updating, Apache Flink
PDF Full Text Request
Related items