PCFP:Parallel Frequent Itemset Mining With Multiple Minimum Supports

Posted on:2019-07-12

Degree:Master

Type:Thesis

Country:China

Candidate:H B Ji

Full Text:PDF

GTID:2348330569495546

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Association rule mining is an important technique for mining implicit content and knowledge in large databases.Since it was first proposed,the problem of association rules mining has been widely concerned by researchers.A typical application is a shopping basket analysis,which analyzes the connection between the products customers buy.The algorithm finds and analyzes the relationship between the minimum support and the minimum confidence of the user in the database.This is,however,seldom the case in reallife applications.In many applications,some items appear very frequently in the data,while others rarely appear.If minsup is set too high,those rules that involve rare items will not be found.To find rules that involve both frequent and rare items,minsup has to be set very low.This may cause combinatorial explosion because those frequent items will be associated with one another in all possible ways.This dilemma is called the rare item problem.Fortunately,a method of using multiple minimum support is proposed to improve the sparse association rules that exist in the database.However,most of the mining algorithm can handle only some very small size of the data,when faced with largescale data,most of the executive efficiency of algorithm meet the real-time demand big data driven decisions.Cloud computing can increase the processing speed of large data sets by providing the correct programming model.The novelty of cloud computing is that it provides infinitely cheap storage and computing power.Therefore,cloud computing provides a platform for the storage and mining of massive data.MapReduce is a massively parallel programming model that implements large data sets.It hides issues such as parallelization,fault tolerance,data distribution and load balancing,and devotes itself to the algorithm design of the application's own computational problem without worrying about some of the details of parallelization,such as the division and storage of input data at multiple nodes.MapReduce becomes a parallel computing model that can effectively mine frequent itemsets from TB or even PB-level datasets.In view of the real database transaction concentration,the importance and probability of different data items are different,and a model based on data item that appears in the transaction data set is presented in this paper.In addition,a parallel pcfpgrowth association rule mining algorithm is proposed based on the MapReduce framework.To dig out those who cover less data but meaningful,users may be more interested in association rules.The results of the experiment show that the algorithm satisfies the different mining needs of different data sets,and it can effectively increase the efficiency of the excavations while handling the big data set.

Keywords/Search Tags:

Association rules, rare item problem, Multiple minimum supports, MapReduc

PDF Full Text Request

Related items

1	Research And Application Of Association Rules Technology Oriented Networking Audit Platform
2	The Research On Association Rules With Multiple Minimum Supports
3	Researches And Applications On Association Rules Mining With Multiple Minimum Supports
4	Association Rule Mining Algorithm
5	Research Of Positive And Negative Association Rules Mining Technology Based On Multiple Supports
6	Weighted Negative Sequential Patterns Algorithm With Multiple Supports
7	Research Of Mining Association Rules Based On The Mutiple Minimum Supports
8	Research Of Classification Based On Negative Association Rules
9	Application Of Generalized Association Rule Mining In The Library New Book Recommendation
10	Research And Optimization Of Association Rules Based On Can Tree