Font Size: a A A

Association Rule Algorithm Optimization And Parallelization Research Based On Spark

Posted on:2020-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:W Y LiuFull Text:PDF
GTID:2438330578455901Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,more and more data are generated in every industry.It is particularly important for each industry to analyze these data and extract important information from it.Nowadays association rule analysis is widely used in every industry.Association rule technology is becoming more and more important in the field of data mining.How to use association rule technology to mine valuable information for users becomes particularly important and has become a research hotspot.Apriori algorithm is an important branch of association rule algorithm,but Apriori algorithm still has some shortcomings,such as scanning transaction datasets repeatedly,generating a large number of redundant rules and so on.Aiming at the problems of Apriori algorithm,this paper proposes a strategy to improve Apriori algorithm based on firefly algorithm,and generates a new algorithm,YHC-Apriori algorithm(Firefly algorithm-Apriori).Because firefly algorithm has the characteristics of automatic and fast optimization,it has the function of screening rules for mining association rules,making the rules more popular with users,thus enhancing Apriori.The efficiency of the algorithm.In addition,aiming at the high time cost of generating frequent itemsets from candidate itemsets of association rules algorithm,the concept of interesting degree is proposed to improve the efficiency of the pruning phase of Apriori algorithm,which improves the time efficiency of the algorithm.Based on the improvement of Apriori algorithm,in order to make Apriori algorithm deal with large data better in the era of data explosion,the Parallelization of YHC-Apriori algorithm is realized by using Spark platform.By comparing the improved YHC-ABS algorithm proposed in this paper with the existing Apriori parallel algorithm YAFIM under different data sets,different computing platforms and different minimum support,the comprehensive evaluation results show that the improved parallel algorithm proposed in this paper is more efficient.In addition,the performance of the proposed YHC-Apriori algorithm is greatly improved compared with that of Apriori algorithm,which proves that this paper The effectiveness of the proposed algorithm improvement strategy.In this paper,the proposed algorithm is applied to the diagnosis of gastric cancer.The detection results of various physiological indicators of the physical examination of cancer patients are used as mining data.The association rules algorithm is used to discover the correlation between several main symptoms of gastric cancer.The conclusions are fed back to the hospital to assist doctors in diagnosing the patients' disease,and then to improve the accuracy of disease diagnosis.Rate.
Keywords/Search Tags:Association Rule, Spark, Firefly Algorithms, Medical Big Data, Apriori
PDF Full Text Request
Related items