Font Size: a A A

Research On The Parallel Data Mining Strategy Under The Cloud Computing Environment

Posted on:2012-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2218330338963064Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cloud computing is a business computing model, it assigns the computing tasks to a large number of computers in the resource pool, it can provide users with computing power, storage capacity and application service capabilities according to their needs; Cloud computing provides cheap and efficient solutions of storing and analyzing mass data. Data mining is the process of discovering information or patterns that are interesting, non-trivial, implicit, previously unknown and potentially useful in large databases. Data mining plays a guiding role on scientific research, business decisions, and other fields, with far-reaching social and economic significance. It is very important to research the data mining strategies based on cloud computing from the theoretical view and practical view.This thesis does the research on the parallel data mining strategy under the cloud computing environment from the aspects of dataset division, dataset allocation and parallel data mining algorithm based on MapReduce and so on. This thesis introduces the concepts and techniques of cloud computing and data mining, the existing dataset division, parallel mechanism and parallel strategy of parallel data mining, the existing parallel association rule mining algorithm, parallel clustering algorithm and parallel classification algorithm. Then, it designs an improved parallel data mining strategy, including dataset division, dataset allocation and improved Apriori algorithm for the cloud computing environment; it also designs the procedure of the improved Apriori algorithm on MapReduce of Hadoop. At last, it builds the Hadoop platform and use the platform for testing functions and performance of the improved algorithm, Test results show that: based on parallel data mining strategy designed by this thesis, the improved algorithm can achieve higher efficiency when doing frequent itemset mining under the cloud computing.The research results have high applications and reference value in the field of cloud computing and massive data mining.
Keywords/Search Tags:Data Mining, Parallel, Cloud Computing
PDF Full Text Request
Related items