Research On Apriori Algorithms Based On Distributed Platform

Posted on:2020-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:P Gu

Full Text:PDF

GTID:2518306512456664

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the arrival of the era of big data,data mining technology and its application fields have been expanded,and various data mining algorithms have emerged.Among them,Apriori algorithm,as a classical association rule mining algorithm,has attracted widespread attention,and many improved Apriori algorithms have emerged.With the increasing amount of data mining,how to expand the data mining processing ability and improve efficiency through distributed cluster has very practical application value and significance.Based on the in-depth analysis of the existing Apriori algorithms,this paper presents an improved Apriori algorithm(IABL)to overcome the shortcomings of the algorithm in terms of the storage and representation of transaction data,the generation of candidate itemsets,the pruning effect and the frequency calculation method of candidate itemsets.The efficient compression of transaction data is realized through IList data structure,and frequent itemsets and mining are adopted.Some frequent itemsets are connected to generate candidate itemsets,which can effectively reduce the number of candidate itemsets generated,and then use the nature of Apriori to complete further pruning of candidate itemsets,and adopt efficient bit operations instead of cyclic counting or search counting to improve the corresponding computing speed and mining efficiency of frequent itemsets.Two different task decomposition strategies,i.e.horizontal partition based on transaction database and vertical partition based on transaction database,are applied to the Hadoop framework of distributed system to meet the needs of large data mining.The strategy based on horizontal partition of transaction database first mines local frequent itemsets,then global frequent itemsets from local frequent itemsets;while the strategy based on vertical partition of transaction database first mines partial frequent itemsets,then connects some frequent itemsets,builds candidate itemsets,and then extracts candidate itemsets from the constructed candidate itemsets.The rest of the frequent itemsets,and ultimately all the frequent itemsets.Finally,through the implementation of IABL algorithm and Hadoop-based IABL algorithm with two different task decomposition strategies and program running test,the verification and comparison of the algorithm are completed on several data sets.The results show the different characteristics of the implementation of IABL algorithm based on Hadoop under two different strategies and the feasibility,effectiveness and efficiency of IABL algorithm,which achieves the objectives of Apriori's Algorithmic improvement and application.

Keywords/Search Tags:

Apriori, Frequent itemsets, Association rules, Hadoop, IABL

PDF Full Text Request

Related items

1	An Improved Method Of Apriori Algorithm Based On Hadoop
2	Research And Application Of Association Rules Algorithm Based On MapReduce
3	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
4	Research On The Method Of Condensing Association Rules
5	Research On Top-K Frequent Itemsets Datamining Algorithm
6	Research On All Frequent Itemsets Mining Algorithm And Its Application To The Classification Area
7	Research For Association Rules Algorithm On Big Data
8	Research On Mining Algorithms Of Maximal Frequent Itemsets And Opened Frequent Itemsets
9	Research And Application On Association Rules Based Bata Mining
10	Association Rule Mining Technology Improvements In Computer Forensics