Font Size: a A A

Research On Algorithm Of Mining Association Rules Based On Hadoop

Posted on:2019-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:T DuFull Text:PDF
GTID:2428330590465526Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,cloud computing,and artificial intelligence,a large amount of data has been accumulated in all walks of life.Competition and analysis of massive data have become an important aspect of competition among countries.In data mining,the main research contents are related mining,classification,clustering and other mining algorithms.Among them,association rule mining can effectively mine the valuable and interesting knowledge and rules hidden in the data,which is obtained in actual data processing.The extensive application is the current research hotspot.This thesis studies association rule mining algorithms in stand-alone mode and big data environment.The main contents include:For the PrePost algorithm with high efficiency in stand-alone mode,there are many problems such as large memory usage,lack of efficient connection and pruning strategies,and complex construction issues.An improved PrePost algorithm based on B-list is proposed.The improved PrePost algorithm uses B-list to represent the itemsets in the database.The set enumeration tree is used to represent the search space of the candidate frequent itemsets and the subset non-frequent strategies are used for pruning.A linear complexity connection between two itemsets is studied.B-list's fast algorithm to calculate the support of itemsets.Experimental results show that the improved PrePost algorithm in stand-alone mode has higher time and space efficiency,and can effectively mine the frequent patterns hidden in the data.In order to solve the problems of low efficiency and memory overflow of traditional mining algorithms in big data environment,the combination of improved PrePost algorithm and Hadoop is proposed,and an algorithm H_PrePost for mining big data association rules based on Hadoop platform is proposed.H_PrePost uses the MapReduce programming model for parallel computing and uses a load balancing strategy to ensure efficient cluster operation.Because there are a large number of frequent itemsets in the big data environment,the frequent patterns mined by kulczynski metrics and unbalance ratios are evaluated to ensure that the mining patterns have practical application value.Experimental results show that H_PrePost algorithm can effectively mine frequent patterns in large data sets and can meet the needs of association rules mining in big data environments.
Keywords/Search Tags:data mining, big data, association rules, frequent itemsets
PDF Full Text Request
Related items