Research On Frequent Itemset Mining Algorithm And Its Parallelization Based On Spark

Posted on:2017-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Lin

Full Text:PDF

GTID:2308330485470217

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Since the problem of frequent itemset mining or fim in short was put forward, it has attracted enormous researchers to improve the efficiency of fim algorithms due to the high time complexity. Traditional fim algorithms are not good at dealing with big data due to they are limited in computing capability and memory space of a single computer.According to the comparison of current fim algorithms and the study of Spark framework, this paper proposes a new itemset representation called HybridNodeset. Meanwhile, this paper proposes a new serial fim algorithm based on HybridNodeset called HybridFIN. The experimental results demonstrates that this algorithm has a better performance on different types of datasets. Besides, this paper applys the new itemset representation to maximal frequent itemset mining problem and adopts a new projection strategy based on MFI-Tree. This paper also proposes a parallel fim algorithm based on Spark called PHybridFIN. PHybridFIN projects the original transactional dataset into multiple conditional datasets and adopts Transaction Trees to reduce the time cost on network transmission. The experimental results indicate that PHybridFIN is superior to PFP which is implemented in Spark MLlib. Finally, this paper improve the parallelization strategy of PHybridFIN and proposes a parallel fim algorithm called PHybridFIN+. The experimental results show that PHybridFIN+ achieves a better performance.

Keywords/Search Tags:

data mining, association rule mining, frequent itemset mining, Spark, parallel computing

PDF Full Text Request

Related items

1	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
2	The Research And Implementation On Association Rule Mining Algorithm Based On Spark
3	Parallel Frequent Itemset Mining Based On MapReduce
4	The Research And Implementation Of Association Rule Data Mining Algorithm
5	Research On Distributed Frequent Itemset Mining Algorithm Based On Spark
6	Research Of Parallel Frequent Itemset Mining Algorithm Based On Spark
7	Association Rule Mining Algorithms
8	Data Mining Technology And Its Applications
9	Research And Application Of Association Rule Mining Algorithm
10	The Research On Algorithm For Association Rules Mining Based On Vertical Data Presentation