Font Size: a A A

Study On Mining Closed Frequent Itemset Based On Hadoop

Posted on:2013-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:G P ChenFull Text:PDF
GTID:2298330431962034Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Closed Frequent Itemset Mining is a useful way for discovering Association Rules from data. With the emergence of more large scale datasets, it is now a significant and challenging issue to mine Closed Frequent Itemset parallelly. The emergence of Hadoop, which is a kind of cloud computing infrastructure, provides a promising solu-tion to address this problem. In this thesis, we focus on the study of large-scale Closed Frequent Itemset Mining based on Hadoop.Firstly, cloud computing platform Hadoop is described in detail, especially its two important components:distributed file system HDFS and distributed data processing system MapReduce. Its working principle and advantage also have been introduced and analyzed.Secondly, a parallel algorithm for mining Closed Frequent Itemset based on Hadoop is presented. The algorithm consists of four main steps:(1) Parallel Counting,(2) Global Frequent List constructing,(3) parallel mining of local Closed Frequent Itemset,(4) and parallel filtrating of global Closed Frequent Itemset. The algorithm AFOPT-close has been MapReduced, which is used to mine local Closed Frequent Itemset. At the same time, a parallel filtrating method is used to find global Closed Frequent Item-set in local results. Experimental results validate the method and show that it is more effective by achieving a satisfied speedup.Finally, a parallel balanced mining algorithm for closed frequent itemset based on Hadoop is proposed. It adopts Greedy strategy to group items aiming to balance the computation burden among all parallel tasks, which is consisted of three main steps:(1) Parallel Counting,(2) Global Construction of Frequent List and Group Map,(3) Parallel Mining for Closed Frequent Itemset. Experimental results validate the method and show its effectiveness as satisfied speedup and scalability are both achieved in large-scale Closed Frequent Itemset Mining tasks.
Keywords/Search Tags:Data Mining, Closed Frequent Itemset, Hadoop, MapReduce, CloudComputing
PDF Full Text Request
Related items