Font Size: a A A

Research On Apriori Algorithms Based On Distributed Platform

Posted on:2020-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:P GuFull Text:PDF
GTID:2518306512456664Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data,data mining technology and its application fields have been expanded,and various data mining algorithms have emerged.Among them,Apriori algorithm,as a classical association rule mining algorithm,has attracted widespread attention,and many improved Apriori algorithms have emerged.With the increasing amount of data mining,how to expand the data mining processing ability and improve efficiency through distributed cluster has very practical application value and significance.Based on the in-depth analysis of the existing Apriori algorithms,this paper presents an improved Apriori algorithm(IABL)to overcome the shortcomings of the algorithm in terms of the storage and representation of transaction data,the generation of candidate itemsets,the pruning effect and the frequency calculation method of candidate itemsets.The efficient compression of transaction data is realized through IList data structure,and frequent itemsets and mining are adopted.Some frequent itemsets are connected to generate candidate itemsets,which can effectively reduce the number of candidate itemsets generated,and then use the nature of Apriori to complete further pruning of candidate itemsets,and adopt efficient bit operations instead of cyclic counting or search counting to improve the corresponding computing speed and mining efficiency of frequent itemsets.Two different task decomposition strategies,i.e.horizontal partition based on transaction database and vertical partition based on transaction database,are applied to the Hadoop framework of distributed system to meet the needs of large data mining.The strategy based on horizontal partition of transaction database first mines local frequent itemsets,then global frequent itemsets from local frequent itemsets;while the strategy based on vertical partition of transaction database first mines partial frequent itemsets,then connects some frequent itemsets,builds candidate itemsets,and then extracts candidate itemsets from the constructed candidate itemsets.The rest of the frequent itemsets,and ultimately all the frequent itemsets.Finally,through the implementation of IABL algorithm and Hadoop-based IABL algorithm with two different task decomposition strategies and program running test,the verification and comparison of the algorithm are completed on several data sets.The results show the different characteristics of the implementation of IABL algorithm based on Hadoop under two different strategies and the feasibility,effectiveness and efficiency of IABL algorithm,which achieves the objectives of Apriori's Algorithmic improvement and application.
Keywords/Search Tags:Apriori, Frequent itemsets, Association rules, Hadoop, IABL
PDF Full Text Request
Related items