Font Size: a A A

Research On Distributed Association Rule Algorithm In Data Mining

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ChenFull Text:PDF
GTID:2428330602471900Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As one of the important research directions in data mining,association rule mining is dedicated to discovering the rules and connections behind the data.The in-depth development of Internet information technology and artificial intelligence technology,which have led to an exponential increase in the amount of global data.Therefore,the algorithm needs to be able to analyze and process more timely and accurately.Because the algorithm itself requires multiple iterations,the mining efficiency is limited by computer performance.And as a result,the traditional method of serial association rule mining is unsustainable.Therefore,the parallel computing platform represented by Hadoop came into being,which provided new ideas and reliable guarantees for processing big data and realized the reliable storage and efficient processing of massive data.The Spark parallel computing framework based on RAM parallelism,with high flexibility and scalability,and can handle iteration problems more efficiently.Therefore,this paper uses Hadoop and Spark to realize distributed parallel of association rule optimization algorithms.The main works are as follows: 1.This paper proposes a BDEclat algorithm based on storage mechanism and deep pruning.As the amount of data increases,the Eclat algorithm has the problems of large candidate set size and frequent connection operations.Binary vectors are used to store the transaction record list,and the bitwise AND operation is used to calculate the transaction support degree.Pre-pruning,constrained pruning,and post-pruning are combined to compress the size of candidate sets.The final result of experiments verifies the effectiveness of the improvement.2.To further improve the algorithm's ability to deal with massive data,this paper proposes a BDEclat parallel algorithm based on the Spark framework—BPEclat.For the problem of imbalanced data partitioning during the algorithm structure and prefix item partitioning,this paper adjusted the algorithm execution order and introduced the calculation amount.And the idea of adaptive step size partitioning are introduced to divide the data.Finally,experiments verify the effectiveness of the improvement.3.This paper proposes a clustering association rule mining model,combining BPEclat and K-Means++.For the TE chemical process data set,the K-Means ++ and sigma principle are used to discretize and standardize the state variables and operating variables.Finally,the mining of operation association rules to verify the practical value and effectiveness of the model,which realizes the transformation from technology research to practical results.
Keywords/Search Tags:Association rules, Eclat algorithm, Parallel computing, Spark, TE process
PDF Full Text Request
Related items