Font Size: a A A

Research And Application Of Association Rules Mining Algorithm Based On Spark

Posted on:2019-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X PanFull Text:PDF
GTID:2348330569988293Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous updating of Internet technology and the popularization of information network technology,we have entered the era of big data.While huge amounts of data have brought rich resources to our lives,it also deepens the difficulty of dealing with data.The traditional serial data mining technology has been unable to meet the people's basic requirements for massive data processing,Hadoop came into being,for the massive data depth analysis be possible.Spark programming framework is the big data computing framework.Compared with the original Hadoop MapReduce framework,Spark is based on memory computing,strong usability and high flexibility characteristics,suitable for processing of large data sets of iterative calculation problems.Association rule mining is one of the most important applications of data mining technology,in order to extract the useful information,original data set needs to be repeated iteratively processing operations and the association rule mining has become extremely jumbled.Distributed and parallel computing based on Spark framework can effectively solve the problem of large data storage and improve the efficiency of processing large data.In this paper,we combine the technical characteristics of Spark framework with traditional association rule mining algorithm,and get the frequent itemset mining algorithm based on big data platform.The main work is as follows:1.This paper proposes an algorithm based on Spark for mining frequent item sets of projection trees.In order to solve the losses caused by repeated traversing data sets,change the storage structure of data,uses broadcast variables as tools,and implants the priori properties of Apriori to reduce the transmission of intermediate variables.Finally,the algorithm combines the advantages of Spark computing framework.2.In order to realize partition and calculation of data,a SPEclat algorithm based on Spark is proposed.For the serial Eclat algorithm applied to large sets of data mining the shortcomings of the algorithm is modified appropriately in many places: changing the data storage structure,in order to shorten the candidate itemsets support the consumption of waste of time counting;Separate the data according to different prefix partition,reduce the data retrieval scope,complete parallel the algorithm.At the end of the experiment,it is proved that this improvement is effective3.The SPEclat algorithm based on Spark big data framework is applied to process and analyze QAR data,thus completing the process of transforming technology into tools that can extract valuable information.
Keywords/Search Tags:Frequent itemset mining, Big data, Spark, Projection tree, Eclat algorithm
PDF Full Text Request
Related items