Font Size: a A A

Research And Implement Of Frequent Itemsets Algorithm Based On Hadoop Cloud Platform

Posted on:2015-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q MaFull Text:PDF
GTID:2428330488999789Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,the data information showed explosive growth.As a result,the traditional stand-alone,serial data mining algorithms have been unable to meet the massive data demand for computing and storage resources.With the advantages of efficient processing performance,reliable storage capacity and parallel programming interface,Hadoop cloud computing technology as a product of Big Data Era,fundamentally solves the performance bottlenecks of the traditional model in dealing with large data and simplifies the complexity of parallel programming.Therefore,under the background of the current Big Data Era,combined with the Hadoop's advantages in big data processing,doing research on how to transform the traditional frequent itemsets mining algorithm into parallel style is particularly meaningful.The main research in this thesis includes the following aspects:Firstly,this paper systematically introduces the advantage of Hadoop in processing large data and the performance bottlenecks of the tranditional data mining model.To solve the performance bottlenecks of FP-growth algorithm in dealing with large data,a parallel improvement scheme based on FP-growth is proposed in this thesis.In this scheme,a kind of "divide and conquer" thought is used to split transactional database in horizontal level,which can make full use of the multi-node parallel processing to accelerate calculating frequent itemsets and conditional pattern.Moreover,in order to avoide the recursive construction of FP-tree,a new conditional pattern NFP-tree is builded by adding a domain space prefix with frequent items in the original FP-tree node,which signifcantly improves the speed of frequent itemsets mining.Secondly,based on the parallel improvement of the traditional FP-growth algorithm and the performance advantage of Hadoop in processing large data,this paper proposes a parallel frequent mining algorithm called NFP-growth.The algorithm consists of two iterative processes of MapReduce:1)Solving 1-frequent item sets;2)calculating conditional pattern base and generating frequent item sets.By such task decompositions,it effectively balances the load of the algorithm in each stage and improves the performance of whole mining algorithm.Finally,this paper uses a simple example to verify the reasonableness of the NFP-growth algorithm.And then,in order to give further verification of NFP-growth,the experiment on Hadoop platform is performed.The results of the experiment show that the algorithm has good scalability and efficiency.
Keywords/Search Tags:Frequent Itemset, MapReduce, conditional pattern base, parallel, FP-growth
PDF Full Text Request
Related items