Font Size: a A A

Incremental Learning Method Based On Cloud Computing

Posted on:2013-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:M LiFull Text:PDF
GTID:2218330371457555Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Internet has been widely used in many fields since its appearance, and the amount of data flowing through the Internet expands at least 20 billion times as it in 1969. At the same time, the speed of data generation becomes faster and the amount of data becomes larger, which is named massive data today. To input newly produced data into learning system, the incremental learning model must be adopted. And how to mine the valuable and understandable knowledge lies in massive data efficiently becomes new challenges in the incremental data mining field.In recent years, the cloud computing has brought new opportunities for the massive data mining. On the one hand, cloud computing can integrate computing resources in the wide area network and provide a physical basis for data mining. On the other hand, the parallel computing technology is the core of cloud computing. Based on the unique distributed programming model—MapReduce of cloud computing, the program can be automatically distributed to large clusters composed of common computers, which achieves the mechanism of automatic parallel execution. In addition, as a unique open source project of the Apache Software Foundation organization, hadoop implements computer clusters easily and quickly. HDFS of hadoop can store the large-scale data and MapReduce programming framework can realize fast parallel computing. So, the design of the incremental classification algorithms based on Hadoop for massive data mining is very significant.To achieve incremental learning based on Hadoop cloud computing platform, the key question is how to parallelize the traditional incremental learning algorithm. In this paper, we analyze the mechanism of the MapReduce framework and characteristics of traditional incremental learning algorithms. Then ensemble learning idea is integrated into incremental learning and two kinds of incremental classification algorithms based on cloud computing platform are presented. Map stage of these two proposed algorithms is to train the base classifiers. Different Map tasks can be executed highly parallel. According to whether the learning environment generates concept drift, Reduce stage adopts the method of classifier combination or classifier selection to integrate the results of base classifiers respectively. The experiments on some data sets (KDDcup 2010, Hyperplane simulation and so on) show the correctness and feasibility of the proposed algorithms.
Keywords/Search Tags:Incremental Classification, Concepts Drift, Cloud Computing, Hadoop
PDF Full Text Request
Related items