| Frequent itemset mining is a basic problem in data mining.Traditional frequent itemset mining only considers the frequency of itemsets,but ignores the inevitable noise influence in reality.In practice,items often have different importance depending on their meaning or value,and items also contain implicit frequent itemsets that do not match exactly.Traditional frequent itemset mining has the following problems:(1)Traditional frequent itemsets mining can not mine implicit frequent itemsets,resulting in a small number of output frequent itemsets.This often results in a failure to discover potentially valuable itemsets.(2)Traditional fault-tolerant frequent item set mining does not consider the importance of item set,which makes it impossible to reflect data value from profit.(3)In real life,the amount of data to be processed is too large,which makes it timeconsuming for users to process the results.Traditional frequent itemset mining fails to solve this problem.In view of the above problems,the main work and innovations of this paper are as follows:(1)By comprehensive studying the status quo of data mining,association rule mining,weighted frequent item set mining,fault-tolerant frequent item set mining at home and abroad,the relevant algorithms in recent years are analyzed,and the advantages and existing problems of these algorithms are summarized.(2)This paper proposes a high-weight fault-tolerant frequent itemsets mining algorithm based on Weighted Dynamic Tree(HWFT-WDT).This algorithm is used to mine high-weight fault-tolerant frequent item sets to ensure that users can get more complete item sets and importance information.The data structure of weighted dynamic tree is proposed,which can save the weight of each node and facilitate the calculation of average weight.Only one weighted dynamic tree is used to avoid the high cost of constructing multiple subtrees.Three pruning strategies are proposed to effectively reduce the search space during excavation.Experimental results show that the proposed algorithm is superior to ft-Pattern Growth algorithm and FT-Apriori algorithm in terms of running time,storage space and ductility.(3)With the increasingly mature development of big data,many data mining algorithms use distributed platforms to improve their own performance and efficiency.In order to meet the requirements of fast mining large data sets,this paper proposes use Spark platform to realize the distributed parallelization of HWFT-WDT algorithm:Parallel High Weight Fault-Tolerant itemsets mining algorithm based on Weighted Dynamic Tree(PHWFT-WDT).Simulation results show that the algorithm can meet the requirements of mining high-weight fault-tolerant frequent itemsets in the big data environment,and greatly improve the performance of the algorithm,which is effective and feasible.In conclusion,this paper combines the theory of fault-tolerant frequent itemsets and weighted frequent itemsets mining,establishs a new data structure and proposes an HWFT-WDT algorithm,and implements distributed parallelization of the algorithm on Spark platform.The experimental results show that,Compared with the advanced faulttolerant frequent item set mining algorithm,HWFT-WDT algorithm has good performance in sparse data set,dense data set,small data set,large data set and super large data set. |