Font Size: a A A

Research And Application Of Association Rules Algorithm Based On MapReduce

Posted on:2020-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:J X WangFull Text:PDF
GTID:2438330590962459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,the data scale in various fields is getting larger and larger.In this context of big data,traditional data analysis methods are gradually unable to meet people's needs,so data mining technology emerges,which provides decision support and prediction help for people through mining hidden information and rules among data.Association rules is one of the widely used and mature methods in data mining technology.Common algorithms in association rules include Apriori algorithm,FP-Growth algorithm and some improved algorithms.The research and application of these algorithms are mostly based on the stand-alone platform.Because of the limitation of the memory capacity of the single machine,the serial algorithm has limited processing capacity for large-scale data.Researchers began to transplant the traditional association rule algorithm to the distributed platform for parallel improvement,so the parallel mining method based on Hadoop platform became a new research direction.According to the above research background,the main content of the thesis is described as follows.Firstly,the classical Apriori algorithm is introduced and parallelized under the MapReduce framework,named MR_Apriori algorithm.Then combining the MR_Apriori algorithm with HBase database,named HMR_Apriori algorithm by improving the pruning method.Secondly,set up a Hadoop distributed cluster environment.By changing the size of the data set and the number of cluster nodes,the experiment of the algorithm execution time comparison is carried out,and the experimental results prove that the parallel algorithm can process large data sets.Moreover,HMR_Apriori algorithm has higher execution efficiency in processing large-scale data.Then,HMR_Apriori algorithm is used to mine the real score data set to analyze the correlation between courses.Finally,HMR_Apriori algorithm is applied to the course management system based on Hadoop distributed platform.So the system can realize the basic functions such as course performance management,and also provide the performance warning function for students based on the data mining results.It has important practical significance and application value in school teaching management.
Keywords/Search Tags:Frequent itemsets, Association rules, Apriori, Hadoop, MapReduce
PDF Full Text Request
Related items