Font Size: a A A

PCFP:Parallel Frequent Itemset Mining With Multiple Minimum Supports

Posted on:2019-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:H B JiFull Text:PDF
GTID:2348330569495546Subject:Engineering
Abstract/Summary:PDF Full Text Request
Association rule mining is an important technique for mining implicit content and knowledge in large databases.Since it was first proposed,the problem of association rules mining has been widely concerned by researchers.A typical application is a shopping basket analysis,which analyzes the connection between the products customers buy.The algorithm finds and analyzes the relationship between the minimum support and the minimum confidence of the user in the database.This is,however,seldom the case in reallife applications.In many applications,some items appear very frequently in the data,while others rarely appear.If minsup is set too high,those rules that involve rare items will not be found.To find rules that involve both frequent and rare items,minsup has to be set very low.This may cause combinatorial explosion because those frequent items will be associated with one another in all possible ways.This dilemma is called the rare item problem.Fortunately,a method of using multiple minimum support is proposed to improve the sparse association rules that exist in the database.However,most of the mining algorithm can handle only some very small size of the data,when faced with largescale data,most of the executive efficiency of algorithm meet the real-time demand big data driven decisions.Cloud computing can increase the processing speed of large data sets by providing the correct programming model.The novelty of cloud computing is that it provides infinitely cheap storage and computing power.Therefore,cloud computing provides a platform for the storage and mining of massive data.MapReduce is a massively parallel programming model that implements large data sets.It hides issues such as parallelization,fault tolerance,data distribution and load balancing,and devotes itself to the algorithm design of the application's own computational problem without worrying about some of the details of parallelization,such as the division and storage of input data at multiple nodes.MapReduce becomes a parallel computing model that can effectively mine frequent itemsets from TB or even PB-level datasets.In view of the real database transaction concentration,the importance and probability of different data items are different,and a model based on data item that appears in the transaction data set is presented in this paper.In addition,a parallel pcfpgrowth association rule mining algorithm is proposed based on the MapReduce framework.To dig out those who cover less data but meaningful,users may be more interested in association rules.The results of the experiment show that the algorithm satisfies the different mining needs of different data sets,and it can effectively increase the efficiency of the excavations while handling the big data set.
Keywords/Search Tags:Association rules, rare item problem, Multiple minimum supports, MapReduc
PDF Full Text Request
Related items