Resarch Of Frequent Pattern Mining Algorithm Based On Hadoop

Posted on:2019-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:B Jiang

Full Text:PDF

GTID:2348330542481622

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Frequent pattern mining is an important problem in data mining research.As the big data time comes,the size of database increases sharply.How to mine frequent pattern efficiently from big transaction database is always a challenge.A good approach to this challenge is to parallelize the mining algorithm.The emergence of cloud computing provides an effective solution to these problems.Hadoop has been widely used as a well-established and efficient open source distributed system architecture in cloud computing mode.In this paper,the frequent pattern mining algorithms based on Hadoop are studied in the cloud computing environment,and two frequent pattern mining algorithms MRdFIN and HFPM based on Hadoop are proposed.The MRdFIN algorithm is a new MapReduce-based parallel algorithm derived from dFIN and it adopts the depth first search strategy.At the same time,a balanced grouping strategy based on greedy thought is designed,which realizes the load balancing of the whole cluster.When the depth mining is carried out on each independent node,MRdFIN is used to reduce the search space and improve the running efficiency of the algorithm by combining the set-enumeration tree search method and the closed attribute between the itemsets and the superser equivalence property.To address the limitations of the current single seach strategy,an HFPM algorithm based on hybrid mining idea is proposed on the basis of MRdFIN.It uses the hybrid search strategy of the PamPh algorithm to automatically shift from breadth-first mining to depth-first mining and to perform breadth-first mining and depth-first mining simultaneously,which takes the advantages of both breadth-first search and depth-first search.And the mixed vertical data format mixset of the Peclat algorithm is used to opportunistically choose between tidsets and diffsets to save intermediate storage space and computation time.On the basis of this,HFPM adopts a fast pruning strategy based on an ordered search tree,and different pruning techniques for different stages of the mining process to improve pruning efficiency.And an intersection optimization strategy based on a fail-fast mechanism,which improves the mining efficiency.Finally,the experimental results show that MRdFIN and HFPM have higher efficiency and scalability than the existing parallel frequent pattern mining algorithms.Compared with HFPM,MRdFIN is more focused on execution speed,and HFPM is more suitable for large-scale database.

Keywords/Search Tags:

data mining, frequent pattern, big data, Hadoop

PDF Full Text Request

Related items

1	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
2	A Study On Algorithms Of Weighted Frequent Pattern Mining
3	The Study On Frequent Patterns Mining And Data Predicting Over Data Streams
4	Research On Frequent Pattern Of Tree Data
5	The Research On The Related Problems Of Association Rule Mining
6	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
7	Research On The Mining Algorithm Based On Data Streams
8	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
9	The Analysis, Based On Data Mining Algorithms For Frequent Pattern Tree
10	Research Of Frequent Pattern Mining Technology And Its Application In Real-time Signal Processing