Font Size: a A A

Research About Data Mining Technologies Based On Cloud Computing

Posted on:2013-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2268330422458065Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud computing is a business computing model, it assigns the computing tasks to alarge number of computers in the resource pool, it can provide users with computing power,storage capacity and application service capabilities according to their needs; Cloudcomputing provides cheap and efficient solutions of storing and analyzing mass data. Datamining is the process of discovering information or patterns that are interesting, non-trivial,implicit, previously unknown and potentially useful in large databases. Data mining plays aguiding role on scientific research, business decisions and other fields, with far-reachingsocial and economic significance. Data mining need to use huge computing and storageresource, so integrate cloud computing and data mining can effectively control computingcost and enhance the efficiency of data mining, breaking the bottleneck of the traditionallimitations of data mining. It is very important to research the data mining strategies based oncloud computing from the theoretical view and practical view.Hadoop is the most famous open source distributed computing framework, through theuse of MapReduce parallel model to effective integration of the computing storage capacity inorder to provide a powerful distributed computing capabilities. This paper mainly focus on thefollowing work:1. Introduced some related concepts and technologies in the cloud computing and datamining, then analyzed the advantages and disadvantages of the classic association rule datamining algorithm Apriori algorithm and some improved algorithms, the Hadoop platform andMapReduce programming model.2. Turn Apriori algorithm into the MapReduce model, improved Apriori paralleling. toimprove the performance of Apriori algorithn on Hadoop framework. The final goal is toimplement a highly scalable MapReduce_Apriori algorithm suitable for cloud computingenvironment.3. Applied the improved Algorithm to the analysis of Insurance policy data sets, the result showsthat its massive data processing efficiency significantly higher than the traditional algorithm, and shows agood speedup.
Keywords/Search Tags:Cloud Computing, Data Mining, Association Rules, Parallel Computing
PDF Full Text Request
Related items