Font Size: a A A

Optimization Research Of FP-Growth Algorithm For Medical Big Data On Cloud Platform

Posted on:2020-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z P MaoFull Text:PDF
GTID:2404330578465833Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of informatization in the medical and health industry,medical data is no longer "big" in the number of big data in the traditional sense,its integration is more extensive,storage forms are more diverse and so on.Large medical data has great potential value.Although China has a large amount of data,the current data mining and analysis is not enough,so a large number of information in various hospitals is still "silent".How to effectively mine the growing mass of medical data is particularly important.This paper also uses Hadoop platform to research and improve the algorithm of mining association rules.Since Han Jiaxuan proposed the FP-Growth algorithm,many domestic and foreign scholars have studied this algorithm and proposed many improved algorithms,such as HPFP algorithm and MR-VER algorithm.However,there are still some legacy issues,for instance,when the data scale is too large,it is impossible to construct a memory-based FP-tree,and it needs to repeatedly iterate through the global FP-tree,but this wastes resources.To solve this problem,a PL-FPgrowth algorithm based on data partitioning without generating global FP-tree is proposed.The algorithm uses parallel mining of local FP-tree to solve the problem that it can not construct a memory-based global FP-tree.When mining local frequent items,it does not need to mine the data information of other nodes and reduces the communication overhead between nodes.PL-FPgrowth algorithm uses MapReduce parallel computing model,but it has the problem that not considering local support when constructing and mining local FP-tree.In order to solve these remaining problems of PL-FPgrowth algorithm,the load balancing LBPL-FPgrowth algorithm is proposed.The algorithm pre-prunes the local FP-tree based on the calculated minimum support counts of the nodes,and retains the frequent itemsets satisfying the local minimum support counts when mining the local frequent itemsets.It reduces the space and time consumption of constructing and mining local FP-tree,and saves the communication overhead between nodes that transmit infrequent itemsets.LBPL-FPgrowth algorithm uses MapReduce computing framework.Before the implementation of the algorithm,the performance of Hadoop cluster nodes is evaluated comprehensively.Considering the performance differences among nodes,load balancing strategy is adopted to shorten the overall working response time of the cluster.Finally,through the Hadoop platform,several experiments were carried out on PL-FPgrowth algorithm and LBPL-FPgrowth algorithm.The validity and scalability of the algorithms were verified by comparing the experimental results.It also proves that the LBPL-FPgrowth algorithm performs more efficiently.
Keywords/Search Tags:Medical big data, FP-Growth algorithm, Hadoop, data partitioning
PDF Full Text Request
Related items