Research On Parallel Mining Algorithm Of Association Pattern Based On Spark

Posted on:2022-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Deng

Full Text:PDF

GTID:2518306551971099

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

As one of the classic research directions in the field of data mining,association pattern mining aims to mine the hidden association relationships from the transactions that do not have connections on the surface.Through the analysis of the mining results,we can get inspiration,as a guide for future decision-making.In the era of the Internet,data is being generated all the time.With the increasing amount of data,the useless information in the data also increases,and the problem of knowledge shortage still exists.Therefore,starting from the data itself,the use of effective technology mining available knowledge is still a popular topic for researchers.Although some methods have tried to make a breakthrough in the mining of association patterns,the mining of association patterns for today's massive data still faces great challenges,such as the traditional stand-alone itemsets mining algorithm on large data sets.And in big data environment,parallel itemsets mining algorithm also has some problems,like long running time,uneven load of nodes,large memory occupation and so on.In view of this,this paper aimed at the difficulties and challenges in the process of parallel association pattern mining,and carried out the following research.(1)The superiority analysis of Spark framework combined with the iterative algorithm FP-Growth.Firstly,traditional serial itemsets mining and parallel itemsets mining are compared.The experiments on datasets webdocs,mushroom and accidents show that the parallel-based method can effectively solve the problem of large-scale data association pattern mining,and the mining rate is significantly better than serial mining.Then the parallel itemsets mining algorithm based on Hadoop and Spark is further studied.The experiment shows that Spark framework combined with FP-Growth algorithm shows higher mining efficiency and better stability compared with Map Reduce.(2)An improved parallel itemsets mining algorithm Opt-SFPG based on Spark is proposed to quickly mine the association relations of large-scale transactions.Based on the advantages of Spark framework,this algorithm optimizes the FP tree generation scale and node computation amount comprehensively.By experiments on webdocs dataset,the high efficiency of Opt-SFPG algorithm is verified.In order to analyze the algorithm performance more comprehensively,using the dataset T40I10D100 K and webdocs,The double-optimized Opt-SFPG algorithm is compared with Eq-SFPG algorithm based on storage optimization and dynamic balancing grouping,Ht-SFPG algorithm based on FP tree item head table optimization and pruning,traditional SFPG algorithm,existing IPFP-Growth algorithm and BFPG algorithm from four aspects of data scale,Support values,Speed-Up and node number.Experimental results show that the proposed Opt-SFPG algorithm has faster mining efficiency and better parallelism.(3)A method based on Opt-SFPG algorithm combined with Spark-LDA model is proposed to realize the mining of subject word association relationship in large-scale text data.Firstly,the dimension-reduced text topic description matrix is obtained by using the Spark-LDA model,and then the Opt-SFPG algorithm is used to mine the text topic dataset.Finally,the implicit text topic information is obtained by in-depth analysis of the mined correlation relations.The results show that this method is feasible and efficient in the text of newspapers and periodicals with "The Belt & Road" keyword and title.

Keywords/Search Tags:

Hadoop, Spark, FP-Growth algorithm, Association mode, Text topic mining

PDF Full Text Request

Related items

1	Research On Spark-based Association Rule Mining Algorithms
2	Mining Association Rules Algorithm Analysis Based On Hadoop
3	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
4	Research Of Parallel Frequent Itemset Mining Algorithm Based On Spark
5	Research On Association Rules Algorithm Based On Hadoop
6	A Study And Implementation Of Web Text Mining System Based On Spark
7	Research And Application Of Parallel FP-Growth Algorithm Based On Spark
8	Research Of FP-Growth Algorithm Based On Spark
9	Research Of FP-growth Data Mining Algorithm
10	Research On Distributed Frequent Itemset Mining Algorithm Based On Spark