Research On Distributed Association Rule Algorithm

Posted on:2018-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:T B Gao

Full Text:PDF

GTID:2428330566967363

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Association rule algorithm is a very important branch In the field of data mining.With the rapid development of computer technology and Internet,finance,telecommunications,insurance daily data show explosive growth,distributed association rules algorithm has a broad space for development today.The existing parallel Apriori data mining algorithm has the problems of multiple scanning database,memory loss,large nodes communication,which can not be optimized simultaneously.This dissertation proposes a parallel Apriori algorithm,which converted the original database into Boolean matrix and weight matrix,and reduced the consumption of memory.Using level to cut the matrix into n small matrix,and introducing the single largest set length to limit of the realistic significance of tiny candidate itemsets generation.The support degree and average weight are calculated by matrix operating,shorted the operation time of the algorithm;The minimum support degree and minimum average weight value are used to reduce the generation of candidate itemsets..The main work of this article:(1)The research of Hadoop distributed system:introduced the core technology and operation mechanism in the Hadoop,including distributed file system(HDFS),database HBase and MapReduce computing framework.introduces the basic.concepts of data mining,requirements and basic framework of data mining system based on Hadoop,gives the system model.(2)The improvement research of parallel Apriori algorithm:Aiming at the problems of multiple scanning database,memory loss,large nodes communication and high load of I/O in the existing parallel Apriori algorithm,this paper proposes a parallel Apriori algorithm based on weighted itemsets.The algorithm uses the minimum average weight and the minimum support degree to limit the generation of unfrequent itemsets,calculated the itemsets support degree and average weight with matrix,and sets the maximum length of realistic significant frequent itemsets.Through one time of scanning database to generate all frequent itemsets.(3)Set up the experimental platform to verify the improved algorithm:Through building the Hadoop distributed clusters,compares the AprioriMR algorithm and the weightd itemset parallel Apriori algorithm from the data size,number of nodes,support size of transaction records.Comparison results show that when the min-support degree is certain,the more nodes,the higher efficiency of the improved algorithm,but when the min-support increasing to a certain extent,due to the reduce of greater than the min-support candidate,the improved algorithm efficiency become slow;When the number of nodes increasing to a certain level,the time to merge nodes will also increase,the efficiency of the improved algorithm will decrease.

Keywords/Search Tags:

Association rule, Hadoop, weighted itemset, matrix

PDF Full Text Request

Related items

1	Research On Algorithm Of Mining Association Rules Based On Matrix
2	Based On The Matrix Of Weighted Association Rules Mining Algorithm
3	Research On A Distributed Weighted Association Rule Mining Algorithm Base On Hadoop
4	Research On Expanded Models Of Association Rules
5	A Study Of The Association Rule Mining EARM Algorithm And It's Application
6	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
7	Research And Improvement Of Weighted Association Rule Mining Algorithm
8	Research On Association Rule Mining Algorithm Application In Customer Churn Prediction
9	Research Of Fast Association Rule Mining Method Based On Equivalence Class Transformation
10	Multiple Association Rules Mining In Human Resources Based On Matrix