Font Size: a A A

Clustering Feature Tree For Large-Scale Support Vector Machines

Posted on:2013-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:F Z XiaoFull Text:PDF
GTID:2248330374974892Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the growth of data size has reached anunprecedented speed, and big data has become a hot topic of machine learning. SupportVector Machines(SVM), which is proposed by Cortes and Vapnik, is the first statisticallearning theory based classification method, has achieved excellent both learning andgeneralization ability among machine learning algorithms. However, when the scale oftraining dataset is too large, the demand of computing resources raise too fast. In order toextend SVM to large scale dataset, this paper research and analysis the issue from thefollowing aspects.Firstly, Based on the ideas of local learning, this paper proposes a large scale classifyingalgorithm HCLL-SVM by combining the clustering algorithm BIRCH and classifyingalgorithm SVM. HCLL-SVM partitions the training dataset into a number of local labeledsubcluster by the hierarchical clustering structures CF of the clustering algorithm BIRCH,then build a local classifier for each local labeled cluster using classifying algorithm SVM.Secondly, for each testing sample, HCLL-SVM selects the closest local SVM classifierto classify it. Extensive experiments conducted on fourteen benchmarking datasets show thatHCLL-SVM improves the training speed and testing speed of large scale datasets whilekeeping the testing accuracy.Finally, this paper conducts an experiment to observe the relationship between thebranching factor of CF tree and the time needed to build the tree, the time needed to train themodel, the accuracy, testing time, the result show that when the value of branching factor isbetween5and10, all the measurements achieve the best result.Additionally, this paper conducts an experiment on an large scale dataset with8,100,000samples with1.5G memory(The memory is not enough for all dataset.) to show thatHCLL-SVM can solve the large scale classification problem with limit system resource.
Keywords/Search Tags:Local learning, SVM algorithm, Local SVM classifier, Large scale dataset
PDF Full Text Request
Related items