Font Size: a A A

Research And Application Of Parallel FP-Growth Mining Algorithm Based On Cloud Computing Platform

Posted on:2019-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q GaoFull Text:PDF
GTID:2428330596450459Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of Big Data,people are facing the difficult problem of data processing.Massive data resources cannot be effectively analyzed and utilized.The main reason for this problem is that traditional processing methods no longer apply to big data scenarios.In recent years,cloud computing technology has developed rapidly,and many excellent cloud computing platforms have been derived.These cloud computing platforms provide an effective solution for the processing of big data,which can serve as a running basis for studying parallel mining algorithms and developing algorithm applications.Firstly,this paper uses multiple computers to deploy the core components of Hadoop and Spark,so as to build a cloud computing platform to achieve a unified management of the storage space and computing power of the cluster,and provide conditions for the realization and operation of parallel data mining algorithms.Association rule algorithm is a data mining algorithm with high practical value.It is applied to many industries,especially in the value mining of medical data.Therefore,this paper chooses the widely used parallel FP-Growth algorithm as the research object.Aiming at the problem of high time complexity in FP-Growth algorithm,an optimized header table structure is proposed to reduce the processing time of FP-Growth on single node.And,in order to solve the problem of unbalanced load existing in the process of parallelization,an improved workload model is introduced into the balanced grouping strategy to balance the load among the groups,try to avoid nodes idle,waste of resources and other issues,caused by uneven distribution of computing tasks in the parallel process,thereby enhancing the ability of parallel FP-Growth algorithm to handle massive data.Meanwhile,this paper designs some contrast experiments about the FP-Growth algorithm of optimizing the structure of header talbe and the parallel FP-Growth algorithm which optimizes the load balancing strategy on the built cloud computing platform,and verifies the superiority of the optimized algorithm.Finally,this paper applies the optimized parallel FP-Growth algorithm to mine association rules in medical data to verify its effectiveness in practical application,and designs an easy-to-use medical data mining system to facilitate the analysis of association rules by non-professionals.
Keywords/Search Tags:big data, cloud computing platform, FP-Growth algorithm, header table, parallelization, load balance
PDF Full Text Request
Related items