Font Size: a A A

Research On Apriori Algorithm Based On Medical Big Data In Cloud Environment

Posted on:2019-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:X N FuFull Text:PDF
GTID:2428330548970317Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement and development of medical and health undertakings,medical data generated by hospitals are increasing.The valuable information contained in medical big data needs to be mined.The massive amount of medical data makes conventional data mining methods no longer applicable.How to use data mining It is an imminent problem for the technology to dig and analyze these huge data to find out the valuable laws and provide help for disease prevention and treatment.Cloud computing provides an important technical guarantee for achieving this goal.The distributed storage and computing performance of Hadoop,an open source cloud computing framework,has made it a mainstream solution to this problem and will certainly provide strong support and guarantee for the development of medical data mining technology.Based on the above background,this paper summarizes Hadoop technology and data mining related knowledge,analyzes Hadoop-based data mining system in detail,and conducts research on typical association rule algorithms.The specific research work includes the following aspects:The traditional Apriori algorithm requires multiple scans of the database,and the serialization efficiency is low.The traditional Apriori algorithm is improved by dividing the ideas.In each iteration of the algorithm,Reduce the number of comparisons by calculating the transaction length.Combined with MapReduce model,MapReduce is designed for the improved algorithm,and a P_Apriori_BP algorithm in cloud environment is proposed to analyze the performance of P_Apriori_BP algorithm.The analysis shows that P_Apriori_BP algorithm effectively reduces the number of database scans and solves the problem serialization of the algorithm inefficiency.For the P_Apriori_BP algorithm to generate frequent itemsets inefficient problem,the use of matrix ideas to transform the transaction database into a Boolean matrix,the transaction storage methods and matrix ranks are compressed and improved,the algorithm is optimized to the end conditions,and transplanted to the Hadoop platform to achieve MapReduce parallelization In this paper,Apriori_PBCM algorithm in cloud environment is proposed to analyze the performance of Apriori_PBCM algorithm.The analysis shows that the Apriori_PBCM algorithm simplifies the calculation of the support degree,effectively reduces the transaction size and the number of iterations of the algorithm,and solves the problem of low efficiency in generating frequent itemsets.Finally,the P_Apriori_BP algorithm and the Apriori_PBCM algorithm are verified experimentally and compared with efficiency by the Hadoop platform,which proves their effectiveness and superiority.
Keywords/Search Tags:Medical big data, Cloud computing, Hadoop, Apriori, Compression matrix
PDF Full Text Request
Related items