Font Size: a A A

Research And Application Of Parallelization Of Association Rule Mining Algorithm

Posted on:2020-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:D X XuFull Text:PDF
GTID:2428330590495966Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of economy and technology,the amount of data is exploding exponentially.Faced with massive data,it has become a difficult problem to obtain valuable key information from the data.Data mining technology provides an effective way to solve this problem,and how to further improve algorithm itself and its application efficiency in various fields has become a hot topic in related fields.Association rule mining is an important data mining task.Association rule mining algorithm can mine potential association relations from data.Apriori algorithm is the most representative algorithm for mining association rules.However,in the process of generating candidate itemsets and calculating itemsets support,its I/O load is very heavy,and its timeliness needs further improvement.Spark platform is a distributed memory-based big data framework suitable for iterative computing.In order to improve the accuracy of strong association rule mining,this thesis improves the Apriori algorithm by introducing degree of interest.The improvement algorithm is named I-Apriori(Improved Apriori).In order to improve the timeliness of strong association rule mining,a parallelization scheme of I-Apriori algorithm based on Spark is designed.This scheme uses the distributed architecture of Spark platform and cluster scheduling mechanism to distribute transaction data sets to multiple child nodes.Each sub-node calls the transformation operation to get the local candidate itemsets and their support.,and stores them in memory.The aggregate node generates global candidate itemsets and global frequent itemsets based on local candidate itemsets.This scheme Iterates above process until the next level candidate set does not exist.The experimental results of performance testing show that the parallel I-Apriori algorithm based on Spark platform can effectively analyze frequent itemsets in large data itemsets and extract strong association rules,it has high accuracy and timeliness.In order to better test the practicability of parallel I-Apriori algorithm,a simple medical auxiliary diagnosis system is developed.This system combines the prescription data and patient's medical history data,uses I-Apriori algorithm to recommend drugs and find possible complications,so as to assist doctors in timely treatment and early prevention of diseases.The application results show that the developed system can recommend drugs based on data information and judge possible complications,and I-Apriori algorithm has certain practical significance for the effective utilization of medical big data.
Keywords/Search Tags:Apriori algorithm, association rules, frequent itemsets, parallelization, Spark, medical auxiliary diagnosis
PDF Full Text Request
Related items