Font Size: a A A

The Research On Classification Algorithms For Active Learning Based On Big Data

Posted on:2016-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y B HuangFull Text:PDF
GTID:2428330542457356Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification algorithm is an important branch of data mining,which is a hot spot in artificial intelligence and database research.And there is a serious problem which is that the traditional classification algorithms are supervised,so it needs a large number of labeled samples construct the training set that trains the model.But there is no labeled sample in the real world and the training set in the experiment must be labeled by manual method,the price however,it is so difficult to complete the experiment,especially for the current society,the requeriment for training set is just beasuse of the big data,this is a huge challenge for the research of classification algorithms.Active learning provide a manner for this challenge and the aim of active learning is to select the unlabeled samples that contain the information which can help for classification from the sample pool to construct the training set.This method of active learning not only reduces the price of manual but also cuts the size of the training set and it is an effective method to solve the bottleneck of classification.This thesis will combine the active learning with classical classification algorithms(SVM,KNN)from the gain of active learning.We first introduce the theory and executive process of the active learning and the two algorithms in detail,and then present our improvement algorithm.The contributions of this thesis are as the follows:1.Present a multi_class SVM algorithm-BC-Multiple-SVM with active learning.Proposing our advice for the problem of unbalanced in the process of samples selected.Present a method on MapReduce to achieve this algorithm.2.Present a KNN algorithm with active learning-Uncertainty-KNN.Proposing our advice for the calculation of the sample and the method of sample selected.Present a method on MapReduce to achieve this algorithm.3.Testing the algorithms on the Hadoop platform with five datasets.The results of experiment show that the preision of the multi_class SVM with active learning based on the strategy of sample balanced is better than the previous algorithm that have not the the strategy of sample balanced.As a whole,the training set that algorithms with active learning construct is smaller and the precision of this method is as the same as the supervised algorithms almost.4.Except it,we also analyse the run time for the SVM and KNN from testing the different datasets and application of the different algorithm on the different dataset.
Keywords/Search Tags:Classification, Active Learning, SVM, KNN, MapReduce
PDF Full Text Request
Related items