Research And Application Of Parallelization Of Association Rule Mining Algorithm

Posted on:2020-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:D X Xu

Full Text:PDF

GTID:2428330590495966

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of economy and technology,the amount of data is exploding exponentially.Faced with massive data,it has become a difficult problem to obtain valuable key information from the data.Data mining technology provides an effective way to solve this problem,and how to further improve algorithm itself and its application efficiency in various fields has become a hot topic in related fields.Association rule mining is an important data mining task.Association rule mining algorithm can mine potential association relations from data.Apriori algorithm is the most representative algorithm for mining association rules.However,in the process of generating candidate itemsets and calculating itemsets support,its I/O load is very heavy,and its timeliness needs further improvement.Spark platform is a distributed memory-based big data framework suitable for iterative computing.In order to improve the accuracy of strong association rule mining,this thesis improves the Apriori algorithm by introducing degree of interest.The improvement algorithm is named I-Apriori(Improved Apriori).In order to improve the timeliness of strong association rule mining,a parallelization scheme of I-Apriori algorithm based on Spark is designed.This scheme uses the distributed architecture of Spark platform and cluster scheduling mechanism to distribute transaction data sets to multiple child nodes.Each sub-node calls the transformation operation to get the local candidate itemsets and their support.,and stores them in memory.The aggregate node generates global candidate itemsets and global frequent itemsets based on local candidate itemsets.This scheme Iterates above process until the next level candidate set does not exist.The experimental results of performance testing show that the parallel I-Apriori algorithm based on Spark platform can effectively analyze frequent itemsets in large data itemsets and extract strong association rules,it has high accuracy and timeliness.In order to better test the practicability of parallel I-Apriori algorithm,a simple medical auxiliary diagnosis system is developed.This system combines the prescription data and patient's medical history data,uses I-Apriori algorithm to recommend drugs and find possible complications,so as to assist doctors in timely treatment and early prevention of diseases.The application results show that the developed system can recommend drugs based on data information and judge possible complications,and I-Apriori algorithm has certain practical significance for the effective utilization of medical big data.

Keywords/Search Tags:

Apriori algorithm, association rules, frequent itemsets, parallelization, Spark, medical auxiliary diagnosis

PDF Full Text Request

Related items

1	Research On Optimization Of Association Rule Apriori Algorithm And Its Parallelization Based On Spark
2	An Algorithm And Context Analysis Of Mining Frequent Closet Itemsets
3	Association Rules Algorithm And Its Applications In Medical Data Mining
4	Research On Parallel Frequent Itemsets Mining Algorithm
5	Research On All Frequent Itemsets Mining Algorithm And Its Application To The Classification Area
6	Frequent Itemsets Incremental Mining And Parallelization Based On Multi-scale
7	Research On Top-K Frequent Itemsets Datamining Algorithm
8	Research On The Method Of Condensing Association Rules
9	Research And Application On Association Rules Based Bata Mining
10	Association Rule Mining Technology Improvements In Computer Forensics