Font Size: a A A

Research On Improvement Of Apriori Algorithm Based On Hadoop Platform

Posted on:2021-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z B XuanFull Text:PDF
GTID:2428330611497652Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous progress of science and technology and the rapid development of society,computer technology has penetrated into various fields of society.However,with the advent of the data era,the amount of data has exploded,and the Apriori algorithm,one of the classic data mining algorithms,has been unable to meet our current needs.It has the disadvantages of slow speed,poor effect,and low computing performance of a single computer.It has been gradually enlarged in the context of big data.How to adapt the Apriori algorithm to the new environment has become a popular research direction of the Apriori algorithm.This article makes up for its shortcomings by improving its algorithm and combining it with the popular cloud computing platform,so that the Apriori algorithm can better adapt to the environment of the data era and bring us more efficient work.The research work of this paper is mainly divided into the following two aspects:Firstly,analyze the reasons for the slow speed and poor effect of the traditional Apriori algorithm under the current data scale environment.The traditional Apriori algorithm has improved the problems of too frequent database scans and too many self-connections during the iteration of the algorithm: first,the algorithm database scanning mechanism is optimized,and the original algorithm is scanned every time the database is modified to become more variable Phase scanning,reduce the number of scans;second,reduce the number of algorithm self-connection comparisons,let(k-1)candidate set cancel the self-connection process,and transform into(k-1)candidate set connected with frequent 1 item set to generate a new k Item candidate set and compare with it.By comparing two examples with the same data,it is concluded that the efficiency of the improved algorithm is improved.Secondly,the core components of the Hadoop platform are analyzed,and its advantages of large scale,low cost,and high reliability are combined with the improved Apriori algorithm,and the feasibility and effectiveness of the combination of the two are analyzed.Finally,through experiments and analysis,it is concluded that the efficiency of using the improved algorithm combined with the Hadoop platform is higher than that of the traditional Apriori algorithm.
Keywords/Search Tags:Data mining, Association rules, Apriori algorithm, Hadoop platform
PDF Full Text Request
Related items