Research On Distributed Association Rule Algorithm In Data Mining

Posted on:2021-05-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Chen

Full Text:PDF

GTID:2428330602471900

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

As one of the important research directions in data mining,association rule mining is dedicated to discovering the rules and connections behind the data.The in-depth development of Internet information technology and artificial intelligence technology,which have led to an exponential increase in the amount of global data.Therefore,the algorithm needs to be able to analyze and process more timely and accurately.Because the algorithm itself requires multiple iterations,the mining efficiency is limited by computer performance.And as a result,the traditional method of serial association rule mining is unsustainable.Therefore,the parallel computing platform represented by Hadoop came into being,which provided new ideas and reliable guarantees for processing big data and realized the reliable storage and efficient processing of massive data.The Spark parallel computing framework based on RAM parallelism,with high flexibility and scalability,and can handle iteration problems more efficiently.Therefore,this paper uses Hadoop and Spark to realize distributed parallel of association rule optimization algorithms.The main works are as follows: 1.This paper proposes a BDEclat algorithm based on storage mechanism and deep pruning.As the amount of data increases,the Eclat algorithm has the problems of large candidate set size and frequent connection operations.Binary vectors are used to store the transaction record list,and the bitwise AND operation is used to calculate the transaction support degree.Pre-pruning,constrained pruning,and post-pruning are combined to compress the size of candidate sets.The final result of experiments verifies the effectiveness of the improvement.2.To further improve the algorithm's ability to deal with massive data,this paper proposes a BDEclat parallel algorithm based on the Spark framework�BPEclat.For the problem of imbalanced data partitioning during the algorithm structure and prefix item partitioning,this paper adjusted the algorithm execution order and introduced the calculation amount.And the idea of adaptive step size partitioning are introduced to divide the data.Finally,experiments verify the effectiveness of the improvement.3.This paper proposes a clustering association rule mining model,combining BPEclat and K-Means++.For the TE chemical process data set,the K-Means ++ and sigma principle are used to discretize and standardize the state variables and operating variables.Finally,the mining of operation association rules to verify the practical value and effectiveness of the model,which realizes the transformation from technology research to practical results.

Keywords/Search Tags:

Association rules, Eclat algorithm, Parallel computing, Spark, TE process

PDF Full Text Request

Related items

1	Research And Application Of Association Rules Mining Algorithm Based On Spark
2	Improving Research On Association Rules Eclat Algorithm
3	Research And Improvement Of Association Rules Mining Algorithm Based On Directed Graph
4	Research On Association Rules Parallel Optimization Algorithm And Application
5	Research On Association Rule Mining Algorithm Based On User Behavior Analysis
6	An Improved Algorithm Of Association Rules Based On The Spark
7	Research For Association Rules Algorithm On Big Data
8	Parallel Association Rules Algorithm Based On Hadoop
9	Research And Application Of Parallel FP-Growth Algorithm Based On Spark
10	The Research And Implementation On Association Rule Mining Algorithm Based On Spark