Font Size: a A A

Research On The Performance Of Regression Algorithm Based On The Hadoop Platform

Posted on:2013-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:L YuFull Text:PDF
GTID:2248330374479217Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud computing is a new type of business computing model based on thedevelopment of Grid computing.In recent years,It become the concept of the world’smajor IT giants stir-fried. It can provide dynamic resource pools, virtualization andhigh availability computing platform. The development of cloud computing hasbrought new challenges and opportunities to data mining technology development.Cloud computing can use a lot of cheap computer clusters instead the high cost of theserver, Thus it greatly reduces the computational cost.The HADOOP is an Apache open source project which is used to build a cloudplatform. Using HADOOP framework can help us achieve computer clusters rapidly.On HADOOP platform, it use HDFS (Distributed File System) to store and correctlarge files, and it can use Map Reduce programming model to calculation. WhenHADOOP will be applied to Data Mining, there is a key problem that is how to makethe traditional Data Mining algorithms achieve parallelization. As for the traditionaldata mining algorithms, combined with the algorithm, we can confirm whether it isparallel only through deep research. To the algorithms which can achieve parallelly,combing Map Reduce programming model, we can move them to HADOOP platformto complete a variety of Data Mining tasks efficiently and parallelly.Robust locally weighted regression algorithm and Logistic regression analyses areincreasingly being used to predict Because robust locally weighted regressionalgorithm is faster compared with the general linear regression techniques, and itprovides a universal curve fitting, it can fit no matter how complex curves. The fasterthe speed of training, learning complex objective function, and never losing dataeasily are all its advantages; In statistics, logistic regression is a type of regressionanalysis used for predicting the outcome of a categorical criterion variable based on one or more predictor variables.This paper first described in detail the core framework and operational mechanismof cloud computing and HADOOP platform. And then combing the traditional datamining system, I present the technical structure based on the HADOOP data miningplatform. Finally, I have improved traditional strong locally weighted regressionalgorithm and MapReduce Logistic regression on the HADOOP platform. Then validity ofthe method is proved by experiments.
Keywords/Search Tags:Data Mining, Cloud Computing, Map Reduce, robust locally weightedregression, Logistic regression analyses
PDF Full Text Request
Related items