Parallel Association Rules Mining Based On Distributed Framework

Posted on:2020-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:N Xie

Full Text:PDF

GTID:2428330623465362

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Association rule mining is a common method for extracting valuable information from big data.It aims to find frequently occurring items and high correlation information from the data.The current data volume has far exceeded the processing power of the single-machine algorithm.At the same time,the traditional parallel association rule mining method has the problems of large I/O overhead,poor scalability,low computational efficiency and high computing resources.Therefore,Aiming at the above problems,a parallel association rule mining algorithm based on distributed framework is proposed,which is based on Apriori algorithm and FP-Growth algorithm and combines data structures such as Bloom filter and Hash tree.Firstly,based on Hadoop-MapReduce framework and Bloom filter,a parallel mining frequent set algorithm P-FIM is proposed,which only needs two MapReduce processes.At the same time,by reducing the number of MapTasks,streamlining transaction sets without generating global candidate sets and effectively reducing I/O overhead,the computing efficiency is improved.Secondly,a dynamic association rule mining algorithm D-Apriori based on Spark Distributed Framework and Bloom filter and Hash tree is proposed to mine frequent sets of frequent data iteratively.Dynamic adaptive optimization method is used to select mining patterns with higher computational efficiency,so as to maximize computational efficiency.The experimental results show that the effectiveness of the two algorithms is validated by the evaluation indexes of several parallel algorithms.The two algorithms have good computational efficiency by comparing with the four mainstream algorithms based on different support degrees and data sets.In addition,the two algorithms are implemented based on Spark and Hadoop,respectively.The improvement effect of the two frameworks on the algorithm is observed,and both of them can be fast.Mining large data sets,Spark has a larger increase in the iterative algorithm D-Apriori,and Hadoop is more suitable for the P-FIM algorithm with high memory requirements.This thesis has 43 figures,13 tables and 63 references.

Keywords/Search Tags:

Association Rules, Distributed Framework, Data Structure, Apriori, FP-Growth

PDF Full Text Request

Related items

1	Research Of Association Rules Algorithm And Application In Data Mining
2	Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing
3	Research On Log-based Association Rules Analysis Method
4	The Research And Implementation Of Association Rules Algorithms-Apriori Based On Cloud Computing
5	Research And Application Of Incremental Association Rules Algorithm Based On An Improved FP-tree
6	Research On Association Rules Algorithm Based On Hadoop
7	Research And Application Of Association Rules Mining Based On FP-growth Algorithm
8	Mining Association Rules Algorithm Analysis Based On Hadoop
9	Application Research Of Association Rules In Reading Data Processing Of College Library
10	Application And Research Of The Association Rules Based On Data Mining