Optimization Research Of FP-Growth Algorithm For Medical Big Data On Cloud Platform

Posted on:2020-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z P Mao

Full Text:PDF

GTID:2404330578465833

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of informatization in the medical and health industry,medical data is no longer "big" in the number of big data in the traditional sense,its integration is more extensive,storage forms are more diverse and so on.Large medical data has great potential value.Although China has a large amount of data,the current data mining and analysis is not enough,so a large number of information in various hospitals is still "silent".How to effectively mine the growing mass of medical data is particularly important.This paper also uses Hadoop platform to research and improve the algorithm of mining association rules.Since Han Jiaxuan proposed the FP-Growth algorithm,many domestic and foreign scholars have studied this algorithm and proposed many improved algorithms,such as HPFP algorithm and MR-VER algorithm.However,there are still some legacy issues,for instance,when the data scale is too large,it is impossible to construct a memory-based FP-tree,and it needs to repeatedly iterate through the global FP-tree,but this wastes resources.To solve this problem,a PL-FPgrowth algorithm based on data partitioning without generating global FP-tree is proposed.The algorithm uses parallel mining of local FP-tree to solve the problem that it can not construct a memory-based global FP-tree.When mining local frequent items,it does not need to mine the data information of other nodes and reduces the communication overhead between nodes.PL-FPgrowth algorithm uses MapReduce parallel computing model,but it has the problem that not considering local support when constructing and mining local FP-tree.In order to solve these remaining problems of PL-FPgrowth algorithm,the load balancing LBPL-FPgrowth algorithm is proposed.The algorithm pre-prunes the local FP-tree based on the calculated minimum support counts of the nodes,and retains the frequent itemsets satisfying the local minimum support counts when mining the local frequent itemsets.It reduces the space and time consumption of constructing and mining local FP-tree,and saves the communication overhead between nodes that transmit infrequent itemsets.LBPL-FPgrowth algorithm uses MapReduce computing framework.Before the implementation of the algorithm,the performance of Hadoop cluster nodes is evaluated comprehensively.Considering the performance differences among nodes,load balancing strategy is adopted to shorten the overall working response time of the cluster.Finally,through the Hadoop platform,several experiments were carried out on PL-FPgrowth algorithm and LBPL-FPgrowth algorithm.The validity and scalability of the algorithms were verified by comparing the experimental results.It also proves that the LBPL-FPgrowth algorithm performs more efficiently.

Keywords/Search Tags:

Medical big data, FP-Growth algorithm, Hadoop, data partitioning

PDF Full Text Request

Related items

1	Analysis And Research Application Of Hyperthyroidism Disease Model Based On Medical Big Data
2	A Study On Collection And Application Of Glaucoma Clinical Cases For Big Medical Data
3	Research On Optimization Of FP-Growth Algorithm Based On Cloud Computing And Medical Big Data
4	Chinese Parallel LDA Algorithm Based On Hadoop And Data Mining In Electronic Medical Records
5	Research And Improvement Of Apriori Algorithm For Medical Cloud Data Based On Hadoop
6	Analysis And Research Of Tumor Mode Based On Medical Big Data
7	Research On Medical Insurance Data Mining Based On Hadoop
8	Design And Implementation Of Data Processing And Analysis Of Rehabilitation Equipment Based On Big Data
9	Design And Implementation Of ECG Data Acquisition And Storage System Based On Hadoop
10	Design And Implementation Of Medical Big Data Analysis And Prediction System Based On Regression Model