Incremental Learning Method Based On Cloud Computing

Posted on:2013-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:M Li

Full Text:PDF

GTID:2218330371457555

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The Internet has been widely used in many fields since its appearance, and the amount of data flowing through the Internet expands at least 20 billion times as it in 1969. At the same time, the speed of data generation becomes faster and the amount of data becomes larger, which is named massive data today. To input newly produced data into learning system, the incremental learning model must be adopted. And how to mine the valuable and understandable knowledge lies in massive data efficiently becomes new challenges in the incremental data mining field.In recent years, the cloud computing has brought new opportunities for the massive data mining. On the one hand, cloud computing can integrate computing resources in the wide area network and provide a physical basis for data mining. On the other hand, the parallel computing technology is the core of cloud computing. Based on the unique distributed programming model—MapReduce of cloud computing, the program can be automatically distributed to large clusters composed of common computers, which achieves the mechanism of automatic parallel execution. In addition, as a unique open source project of the Apache Software Foundation organization, hadoop implements computer clusters easily and quickly. HDFS of hadoop can store the large-scale data and MapReduce programming framework can realize fast parallel computing. So, the design of the incremental classification algorithms based on Hadoop for massive data mining is very significant.To achieve incremental learning based on Hadoop cloud computing platform, the key question is how to parallelize the traditional incremental learning algorithm. In this paper, we analyze the mechanism of the MapReduce framework and characteristics of traditional incremental learning algorithms. Then ensemble learning idea is integrated into incremental learning and two kinds of incremental classification algorithms based on cloud computing platform are presented. Map stage of these two proposed algorithms is to train the base classifiers. Different Map tasks can be executed highly parallel. According to whether the learning environment generates concept drift, Reduce stage adopts the method of classifier combination or classifier selection to integrate the results of base classifiers respectively. The experiments on some data sets (KDDcup 2010, Hyperplane simulation and so on) show the correctness and feasibility of the proposed algorithms.

Keywords/Search Tags:

Incremental Classification, Concepts Drift, Cloud Computing, Hadoop

PDF Full Text Request

Related items

1	Incremental Support Vector Machine Algorithm Integrated With Cloud Computing And Application Research
2	Research On Decision Tree Classification Algorithm Based On Hadoop
3	Research On Classification Algorithm Used HADOOP
4	The Recommendation Algorithm Based On Cloud Computing Research
5	The Research And Application Of Text Classification Based On Cloud Computing
6	Research And Improvement On Data Classification Algorithms In Cloud Environment
7	The Parallel Reseach On Decision Tree Classification Algorithm Based On Hadoop
8	Research And Implementation Of Na(i\|¨)ve Bayes Text Classification Based On Cloud
9	The Construction Of Cloud Computing Platform Based On Hadoop&Integration Issue Research With IoT
10	Research On The Key Technologies Of Cloud Computing Platform Hadoop