Research On Efficient Mining Algorithm For Rare Itemsets

Posted on:2019-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:S N Liu

Full Text:PDF

GTID:2428330590465785

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Association rule mining technology is one of the important methods of data mining,its main significance is able to find out potential and valuable correlations between different items from the data.According to the frequency of the occurrence of items,items in the data can be divided into frequent items and rare items.Among them,the mining of frequent itemsets is currently the focus of attention.However,it will filter out rare items in the process of mining,and studying rare association rules can find many unknown and valuable laws of reality.Moreover,applying these laws in some areas can bring great economic and social benefits.Therefore,how to quickly and effectively mine rare association rules from data to provide decision-makers with more scientific basis for planning is an important topic in the field of data mining.With the arrival of the era of big data,data is rapidly growing,so how to quickly and effectively mine rare itemsets from large-scale data is a key issue.Based on the distributed computing framework Spark,this thesis implements the parallelization of rare itemsets mining algorithms according to the characteristics of rare itemsets mining algorithms,so that the algorithm can handle large data problems quickly and efficiently.The main research work of this thesis is:(1)Firstly,the thresholds and filter conditions of the DEclat algorithm are reset so that the improved DEclat' algorithm is suitable for the mining of rare itemsets.However,when the DEclat' algorithm mines rare itemsets,a large number of intersection operations result in inefficient execution of the algorithm.To solve this problem,REclat algorithm based on the idea of hash Boolean matrix is proposed.The proposed algorithm reduces the time required for each compution of intersection set,that is it reduces the time for the count of candidate set support.Theoretical analysis and comparison experiments show that REclat algorithm has good execution efficiency in the mining of rare itemsets of data sets with different number of transactions and different number of attributes.(2)In order to implement REclat algorithm to effectively mine rare itemsets in big data environment,SP-REclat algorithm for parallelization in Spark framework which according to the characteristics of REclat algorithm is proposed.Firstly,the equivalence class division is carried out on the itemsets with the same prefix,so that the same equivalence class is divided into the same computing node.Then,the k-item equivalence class of the same node can be directly connected to generate a(k+1)-item rare itemsets.Finally,the equivalence class division is carried out again on the(k+1)-item rare itemsets generated by each node.The SP-REclat algorithm is iteratively called to mine the set of rare itemsets until no more more itemsets are produced.Therefore,the parallelization of the REclat algorithm under the Spark framework is realized.The experiments show that SP-REclat algorithm is feasible and effective,and it has a good speedup and scalability.

Keywords/Search Tags:

association rules, rare itemsets, Eclat algorithm, parallelization computing

PDF Full Text Request

Related items

1	Research And Application Of Parallelization Of Association Rule Mining Algorithm
2	Research On Distributed Association Rule Algorithm In Data Mining
3	Improving Research On Association Rules Eclat Algorithm
4	Research For Association Rules Algorithm On Big Data
5	CPU Parallelization And Distribution Eclat Algorithm Based On Bit Storage Type Tid
6	Research On Association Rules Mining Algorithm In Big Data Background
7	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
8	Frequent Itemsets Incremental Mining And Parallelization Based On Multi-scale
9	Research And Application Of Frequent Itemsets Mining Algorithm
10	Association Rules And Incremental Updating Of Association Rules