Clustering Feature Tree For Large-Scale Support Vector Machines

Posted on:2013-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:F Z Xiao

Full Text:PDF

GTID:2248330374974892

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, the growth of data size has reached anunprecedented speed, and big data has become a hot topic of machine learning. SupportVector Machines(SVM), which is proposed by Cortes and Vapnik, is the first statisticallearning theory based classification method, has achieved excellent both learning andgeneralization ability among machine learning algorithms. However, when the scale oftraining dataset is too large, the demand of computing resources raise too fast. In order toextend SVM to large scale dataset, this paper research and analysis the issue from thefollowing aspects.Firstly, Based on the ideas of local learning, this paper proposes a large scale classifyingalgorithm HCLL-SVM by combining the clustering algorithm BIRCH and classifyingalgorithm SVM. HCLL-SVM partitions the training dataset into a number of local labeledsubcluster by the hierarchical clustering structures CF of the clustering algorithm BIRCH,then build a local classifier for each local labeled cluster using classifying algorithm SVM.Secondly, for each testing sample, HCLL-SVM selects the closest local SVM classifierto classify it. Extensive experiments conducted on fourteen benchmarking datasets show thatHCLL-SVM improves the training speed and testing speed of large scale datasets whilekeeping the testing accuracy.Finally, this paper conducts an experiment to observe the relationship between thebranching factor of CF tree and the time needed to build the tree, the time needed to train themodel, the accuracy, testing time, the result show that when the value of branching factor isbetween5and10, all the measurements achieve the best result.Additionally, this paper conducts an experiment on an large scale dataset with8,100,000samples with1.5G memory(The memory is not enough for all dataset.) to show thatHCLL-SVM can solve the large scale classification problem with limit system resource.

Keywords/Search Tags:

Local learning, SVM algorithm, Local SVM classifier, Large scale dataset

PDF Full Text Request

Related items

1	The Algorithm Research Of Support Vector Machine Based On The Decomposition Of The KD Tree
2	Local Learning And Global Preserving Based Semi-supervised Algorithm For Large Scale Classification Problems
3	Local Online Learning For Large Scale Data
4	Research On Svm Based On Large-Scale Training Set
5	New Algorithms Based On Decomposing And Local Search For Large Scale Global Optimization
6	Studies On Classifiers Based On Decision Boundaries From The Perspective Of Dividing Data Space
7	Study Of DEM's Management And Scheduling Key Technologies Based On Large-Scale Dataset
8	Eye Detection Based On Local Features
9	A Near-Infrared Survey: Local Large-Scale Structure and The Near-Infrared Background
10	Optimization Of The Local Static Routing Algorithm On Pdns