Font Size: a A A

Resarch Of Frequent Pattern Mining Algorithm Based On Hadoop

Posted on:2019-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:B JiangFull Text:PDF
GTID:2348330542481622Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Frequent pattern mining is an important problem in data mining research.As the big data time comes,the size of database increases sharply.How to mine frequent pattern efficiently from big transaction database is always a challenge.A good approach to this challenge is to parallelize the mining algorithm.The emergence of cloud computing provides an effective solution to these problems.Hadoop has been widely used as a well-established and efficient open source distributed system architecture in cloud computing mode.In this paper,the frequent pattern mining algorithms based on Hadoop are studied in the cloud computing environment,and two frequent pattern mining algorithms MRdFIN and HFPM based on Hadoop are proposed.The MRdFIN algorithm is a new MapReduce-based parallel algorithm derived from dFIN and it adopts the depth first search strategy.At the same time,a balanced grouping strategy based on greedy thought is designed,which realizes the load balancing of the whole cluster.When the depth mining is carried out on each independent node,MRdFIN is used to reduce the search space and improve the running efficiency of the algorithm by combining the set-enumeration tree search method and the closed attribute between the itemsets and the superser equivalence property.To address the limitations of the current single seach strategy,an HFPM algorithm based on hybrid mining idea is proposed on the basis of MRdFIN.It uses the hybrid search strategy of the PamPh algorithm to automatically shift from breadth-first mining to depth-first mining and to perform breadth-first mining and depth-first mining simultaneously,which takes the advantages of both breadth-first search and depth-first search.And the mixed vertical data format mixset of the Peclat algorithm is used to opportunistically choose between tidsets and diffsets to save intermediate storage space and computation time.On the basis of this,HFPM adopts a fast pruning strategy based on an ordered search tree,and different pruning techniques for different stages of the mining process to improve pruning efficiency.And an intersection optimization strategy based on a fail-fast mechanism,which improves the mining efficiency.Finally,the experimental results show that MRdFIN and HFPM have higher efficiency and scalability than the existing parallel frequent pattern mining algorithms.Compared with HFPM,MRdFIN is more focused on execution speed,and HFPM is more suitable for large-scale database.
Keywords/Search Tags:data mining, frequent pattern, big data, Hadoop
PDF Full Text Request
Related items