Research On Frequent Itemsets Mining Algorithm Based On Mapreduce

Posted on:2020-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Guo

Full Text:PDF

GTID:2428330590471598

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Big data is not only a hot issue,but also an important resource.Data is everywhere in our life,how to obtain useful information from data is very important.Data mining can apply the value of data to daily production.Therefore,data mining becomes an important technology to process and analyze big data.There are different branches of data mining,such as classification,association analysis,cluster analysis and anomaly detection,among which association analysis is a hot topic in data mining algorithms.The main research contents of association analysis are as follows:Through the study of classical data mining algorithms,it is concluded that classical data mining algorithms generally have problems such as low efficiency and large memory loss.To this end,Apriori's improved algorithm(GNA)combined with genetic algorithm is proposed,which is an algorithm based on new genetic algorithm to find frequent itemsets.Simple and easy to implement is the strength of the Apriori algorithm,but the splicing and generation process of the candidate set is too complicated,and the Apriori algorithm scans the database once for the candidate set.These defects are caused by the low efficiency and high memory loss of the Apriori algorithm.main reason.By combining genetic algorithm to optimize the search space,and using Apriori's pruning strategy,a new algorithm for simplifying the splicing and generation process of candidate set of Apriori algorithm with constrained crossover and mutation operator is studied.Traditional data mining algorithms are mining in independent mode,and their mining efficiency is not suitable for big data mining.Therefore,the improved Apriori algorithm is combined with Hadoop,and a big data association pattern parallel mining algorithm based on MapReduce(Mr_GNA)is proposed.The Mr_GNA algorithm combines the GNA algorithm with Hadoop's MapReduce parallelization computing framework to implement parallelization of the algorithm.In order to ensure that the Mr_GNA algorithm can be efficiently mined under the Hadoop cluster,a reasonable load balancing strategy is adopted.The frequent pattern is evaluated using the Kurczynski coefficient and the support imbalance ratio IR.The experimental results show that the Apriori improved algorithm combined with genetic algorithm has advantages in time complexity,memory loss and mining efficiency compared with Apriori and NSFI algorithms.The improved big data mining algorithm is more efficient in cluster mode,and is superior to the parallel big data mining algorithms such as MRApriori and PFP-Growth.It proves that the Mr_GNA algorithm can effectively mine frequent patterns and meet the needs of big data mining.

Keywords/Search Tags:

data mining, apriori, association analysis, parallelization

PDF Full Text Request

Related items

1	Research And Application Of Parallelization Of Association Rule Mining Algorithm
2	Research And Application Of Apriori Algorithm In Association Rules Mining
3	Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing
4	Research On The Key Technology Of Information Association Based On Data Mining
5	The Application Of Association Rule Mining In Analysis On Property Crimes
6	Analysis,Optimization And Application On The Algorithms Of Mining Association Rules
7	Research Of Data Mining By Assocation Rules And Its Application To The Analysis Of Academic Achievements
8	Association Rule Mining Algorithms, Data Mining Technology
9	Research On Optimization Of Association Rule Apriori Algorithm And Its Parallelization Based On Spark
10	Research And Application Of Apriori Mining Algorithm Based On Association Rules In Colleges Management System