Research And Application Of Association Rules Mining Algorithm Based On Spark

Posted on:2019-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:X Pan

Full Text:PDF

GTID:2348330569988293

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous updating of Internet technology and the popularization of information network technology,we have entered the era of big data.While huge amounts of data have brought rich resources to our lives,it also deepens the difficulty of dealing with data.The traditional serial data mining technology has been unable to meet the people's basic requirements for massive data processing,Hadoop came into being,for the massive data depth analysis be possible.Spark programming framework is the big data computing framework.Compared with the original Hadoop MapReduce framework,Spark is based on memory computing,strong usability and high flexibility characteristics,suitable for processing of large data sets of iterative calculation problems.Association rule mining is one of the most important applications of data mining technology,in order to extract the useful information,original data set needs to be repeated iteratively processing operations and the association rule mining has become extremely jumbled.Distributed and parallel computing based on Spark framework can effectively solve the problem of large data storage and improve the efficiency of processing large data.In this paper,we combine the technical characteristics of Spark framework with traditional association rule mining algorithm,and get the frequent itemset mining algorithm based on big data platform.The main work is as follows:1.This paper proposes an algorithm based on Spark for mining frequent item sets of projection trees.In order to solve the losses caused by repeated traversing data sets,change the storage structure of data,uses broadcast variables as tools,and implants the priori properties of Apriori to reduce the transmission of intermediate variables.Finally,the algorithm combines the advantages of Spark computing framework.2.In order to realize partition and calculation of data,a SPEclat algorithm based on Spark is proposed.For the serial Eclat algorithm applied to large sets of data mining the shortcomings of the algorithm is modified appropriately in many places: changing the data storage structure,in order to shorten the candidate itemsets support the consumption of waste of time counting;Separate the data according to different prefix partition,reduce the data retrieval scope,complete parallel the algorithm.At the end of the experiment,it is proved that this improvement is effective3.The SPEclat algorithm based on Spark big data framework is applied to process and analyze QAR data,thus completing the process of transforming technology into tools that can extract valuable information.

Keywords/Search Tags:

Frequent itemset mining, Big data, Spark, Projection tree, Eclat algorithm

PDF Full Text Request

Related items

1	Research On Frequent Itemset Mining Algorithm And Its Parallelization Based On Spark
2	Research And Application Of Frequent Itemset Mining Algorithm
3	Research On Distributed Frequent Itemset Mining Algorithm Based On Spark
4	Research On Frequent And Closed High Utility Itemset Mining Algorithm Based On Spark
5	Research On Fast Frequent Itemsets Mining Algorithm And Their Applications
6	CPU Parallelization And Distribution Eclat Algorithm Based On Bit Storage Type Tid
7	Study Of Fast Algorithms For Frequent Itemset Mining From Uncertain Data
8	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
9	Study On Association Rules Mining Algorithm Based On FP-tree
10	Research Of An Algorithm For Frequent Closed Itemset Mining On Data Stream