Research And Implement Of Frequent Itemsets Algorithm Based On Hadoop Cloud Platform

Posted on:2015-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:Q Ma

Full Text:PDF

GTID:2428330488999789

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of mobile Internet,the data information showed explosive growth.As a result,the traditional stand-alone,serial data mining algorithms have been unable to meet the massive data demand for computing and storage resources.With the advantages of efficient processing performance,reliable storage capacity and parallel programming interface,Hadoop cloud computing technology as a product of Big Data Era,fundamentally solves the performance bottlenecks of the traditional model in dealing with large data and simplifies the complexity of parallel programming.Therefore,under the background of the current Big Data Era,combined with the Hadoop's advantages in big data processing,doing research on how to transform the traditional frequent itemsets mining algorithm into parallel style is particularly meaningful.The main research in this thesis includes the following aspects:Firstly,this paper systematically introduces the advantage of Hadoop in processing large data and the performance bottlenecks of the tranditional data mining model.To solve the performance bottlenecks of FP-growth algorithm in dealing with large data,a parallel improvement scheme based on FP-growth is proposed in this thesis.In this scheme,a kind of "divide and conquer" thought is used to split transactional database in horizontal level,which can make full use of the multi-node parallel processing to accelerate calculating frequent itemsets and conditional pattern.Moreover,in order to avoide the recursive construction of FP-tree,a new conditional pattern NFP-tree is builded by adding a domain space prefix with frequent items in the original FP-tree node,which signifcantly improves the speed of frequent itemsets mining.Secondly,based on the parallel improvement of the traditional FP-growth algorithm and the performance advantage of Hadoop in processing large data,this paper proposes a parallel frequent mining algorithm called NFP-growth.The algorithm consists of two iterative processes of MapReduce:1)Solving 1-frequent item sets;2)calculating conditional pattern base and generating frequent item sets.By such task decompositions,it effectively balances the load of the algorithm in each stage and improves the performance of whole mining algorithm.Finally,this paper uses a simple example to verify the reasonableness of the NFP-growth algorithm.And then,in order to give further verification of NFP-growth,the experiment on Hadoop platform is performed.The results of the experiment show that the algorithm has good scalability and efficiency.

Keywords/Search Tags:

Frequent Itemset, MapReduce, conditional pattern base, parallel, FP-growth

PDF Full Text Request

Related items

1	Research Of Parallel Frequent Itemset Mining Algorithm Based On MapReduce
2	Parallel Frequent Itemset Mining Based On MapReduce
3	Research On Parallel Frequent Itemset Mining Algorithm Based On MapReduce
4	Research And Realization Of Parallel Algorithm For Mining Frequent Closed Itemsets
5	Research On Parallelization Of Frequent Itemsets Mining Algorithm
6	Research On Parallelization And Load Balancing Of Frequent Pattern Mining Algorithm Based On MapReduce
7	Research On Mining Frequent Itemsets Algorithm Based On Bittable
8	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data
9	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
10	New algorithms for frequent sequential pattern and itemset data mining in certain and uncertain databases