Font Size: a A A

Research On Data Mining Under Cloud Computing Platform

Posted on:2014-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y L FuFull Text:PDF
GTID:2248330395483979Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Modern Internet information has a very rich commercial value. Accurately picking up usefulinformation and knowledge from these data in a high speed allows enterprises to step ahead in thehighly competitive commercial so that they can gain commercial success and economic benefit.However, data mining was initially used to handle small amounts of data. It will become a verylong time-consuming process with increasing size of the input data. Nowadays the explosive growthof the Internet data even make the point where a single compute can hardly handle.Cloud computing platform which has a very high scalability is ideal for handling large-scaledata. Its storage and computing power can be enhanced by dynamically increasing compute nodesof the platform. If traditional data mining algorithms could be correspondingly transformed anddeployed to the cloud computing platform, there is no doubt that the problem of large-scale Internetdata mining can be solved.This thesis began with the analysis of the theories of cloud computing platform: Google FileSystem, distributed programming model map-reduce, distributed data storage system BigTable andwidely used open source cloud computing platform hadoop framework structure. Then takinglogistic regression and the association rules as examples, the improved algorithms that can beapplied to cloud computing platform were proposed. Finally, tests of the improved algorithms werecarried out on the hadoop platform and found that the cost of algorithm decreases linearly with thesize of the hadoop cluster.
Keywords/Search Tags:cloud computing, map-reduce, data mining, hadoop
PDF Full Text Request
Related items