Font Size: a A A

Research And Design Of Data Mining System For Tcm Disease Based On Cloud Computing Environment

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2348330566466104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid growth of data in recent years is a major challenge for the traditional single machine data processing.How to use data mining technology and powerful data processing ability effectively,mining valuable information from these massive data has become the key problem in data mining.The emergence of cloud computing has effectively solved the above problems in data mining.In order to break the problem of storage and calculation data in traditional Chinese medicine,in this paper,first through the in-depth study of Hadoop and Spark system which has the ability to efficiently process data and taking this as the background,in view of the problem that traditional FP-Growth needs frequent recursion mining when constructing FP-Tree,a parallelized and improved optimization scheme is designed:using the characteristics of HDFS distributed storage to store data sets in nodes,and use the idea of "divide and conquer" to count and group the data in each node,then improved the overall processing efficiency of the algorithm by local FP-Growth processing;Finally the algorithm is run in the Hadoop cluster and the Spark cluster respectively.The results show that both of them have high computing efficiency and data processing capability.But for algorithms that need to be iterated for many times,the memory based RDD abstract model of Spark is more efficient,and as the amount of data increases,the advantage of Spark is more obvious.Finally,based on the above,research and design a medical data mining systemwhich centralized data management and data mining function in one based on the cloud computing environment,and provides a more detailed analysis and design of the functions,realized the application of parallel FP-Growth algorithm in TCM asthma.Based on the analysis of mining results,we found the rule of disease and syndrome type,disease and prescription,prescription compatibility.The results show that the improved parallelization of the algorithm greatly improves the computational efficiency,and the analysis of mining results is in line with the practical diagnosis experience of traditional Chinese medicine,which has a certain reference value and auxiliary role for the clinical diagnosis of traditional Chinese medicine.
Keywords/Search Tags:Application of data mining in TCM, Parallel FP-Growth algorithm, Hadoop, Spark
PDF Full Text Request
Related items