Research On Optimization Of Eclat Algorithm Based On Cloud Computing And Medical Big Data

Posted on:2021-03-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Lin

Full Text:PDF

GTID:2404330611466807

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

With the advancement of medical informationization,medical data is increasing day by day.In this context,the traditional association rule mining algorithm has the problem of the algorithm running too long in medical big data.The advent of cloud computing platforms provides an effective solution to this problem.In this paper,the equivalent conversion Eclat algorithm in the association rules is studied and optimized,the R-Eclat algorithm is proposed,the Parallelization of R-Eclat is realized by using the Spark cloud computing framework,and the parallel algorithm is applied to medical big data.The main work was done:1.The study and optimization of the equivalence conversion Eclat algorithm.In view of the increasing scale of transaction sets in the database,the problem of time and spatial complexity will occur,and by using the a priori theorems in the association rules,an optimization scheme is proposed in the connection step of the Eclat algorithm,and some duplicate or infrequent item sets are reduced,and an improved algorithm,R-Eclat,is proposed.The effectiveness of the R-Eclat algorithm is verified by comparing the original algorithm on different types of public data sets.The R-Eclat algorithm has faster running time than the original algorithm,and the algorithm is up to 20% more efficient,and the optimization effect of the R-Eclat algorithm is more obvious on sparse data sets than the dense data set.2.Parallelization of The R-Eclat algorithm based on Spark RDD.In view of the problems existing in the serial environment of the algorithm,the parallelization scheme is proposed by Spark RDD operator,which adds a triangular accumulation matrix in the process of mining the intersection of frequent item sets by The R-Eclat algorithm,which optimizes the filtering operation of the candidate frequent item set.Then,in the construction of the Spark cluster,the parallelized R-Eclat algorithm is realized.By comparing the yaf IM algorithm based on Spark and changing the number of computing nodes of the cluster,the R-Eclat algorithm has some improvement in algorithm efficiency than the YAFIM algorithm,while the R-Eclat algorithm has good compute node extensibility in spark cluster environment.3.Parallelized R-Eclat algorithms are used in diabetes data sets.In view of the algorithm’s use of triangulation matrix as the characteristics of accumulators,the property items of the dataset are mapped to the corresponding item number table.The dataset is split into different sizes and compared to the algorithm in the serial environment.The experimental results show that the efficiency improvement effect of the algorithm is more obvious when the data scale is larger,and the correlation rules excavated show that the detection of glycifyding hemoglobin can determine whether diabeticpatients need to be sent to hospital again.

Keywords/Search Tags:

Eclat, Cloud Computing, Parallel Computing, Medical Big Data

PDF Full Text Request

Related items

1	Matrix Computation And Its Application In Simulation Of Hemodynamics Based Cloud Computing Platform
2	Application And Exploration Of Collaborative Medical Care Based On Cloud Computing
3	Key Technologies Study On Medical Clond For Big Medical Data Processing
4	Mobile Medical Follow-up System Based On Cloud Computing
5	Research And Implementation Of Rural Basic Medical Information System Based On Cloud Computing
6	Research And Practice Of Medical Imaging Cloud Services Platform
7	Big Data-driven Medical CPS Modeling Method Based On Cloud Platform
8	Research And Improvement Of Apriori Algorithm For Medical Cloud Data Based On Hadoop
9	The Design And Implementation Of Medical Device Testing Platform Based On Cloud Computing
10	Brain MR Images Of Cloud Computing And Visualization-based Research And Realization