Parallel Min-Max Modular Support Vector Machine With Application To Patent Classification

Posted on:2010-08-20

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Ye

Full Text:PDF

GTID:2178360275470257

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Large scale machine learning problems always restrict real application of many ma-chine learning algorithms. Those problems are pretty common, such as patent classification.Even for efficient algorithms such as Support Vector Machine, large-scale problems are stilltough. It is quite feasible to breakthrough single constraints,and use parallel computingenvironment to solve these large scale problems.Min-Max Modular Support Vector Machine(M3-SVM) is an"divide and conquer"based algorithm which can effectively solve large-scale problems. It decompose large-scaleproblems into many subproblems, and reorganize classifiers for those subproblems, to givea solution to the original large-scale problem. This algorithm is born parallel.In this work, we analyze the realization of parallel M3-SVM, and its training and test-ing time complexity. Based on the original min-max modular parallel testing algorithm, weproposed pipeline style Symmetric Classifier Selection(SCS), Asymmetric Classifier Selec-tion(ACS), and Decision tree Classifier Selection(DCS) algorithms. Experimental resultsshow that pipeline style classifier selection algorithms significantly accelerate the testingstep. In the problem decomposition, we proposed a new"centroid connection"based algo-rithm, which turns out to be most effective when no prior knowledge is used.In application, we solve large-scale text classification problem with parallel M3-SVM.In particular, solve Japanese patent classification under computer cluster environment. Com-paring M3-SVM and traditional SVMlight, we discover that M3-SVM is more efficient andeffective.In addition, because M3-SVM can decompose an imbalanced problem into many bal-anced subproblems, it can solve class imbalance problems effectively. We did an systematiccomparison of M3-SVM with some popular algorithms on three completely different imbal-ance problems, and discover that M3-SVM is more effective than cost sensitive learning andSMOTE re-sampling for imbalance problems.

Keywords/Search Tags:

Support Vector Machine, Min-max Modular SVM, large-scale learning problem, class imbalance problem, parallel machine learning, patent classification, classi-fier selection algorithm, pipeline

PDF Full Text Request

Related items

1	Machine Learning Based Patent Categorization
2	Large-scale Patent Classification Based On Parallel Machine Learning
3	Research On Fuzzy Support Vector Machine Algorithm For Class Imbalance Learning
4	Study On The Incremental Learning Algorithms For Support Vector Machines
5	Research On Large Scale Sparse Support Vector Machines
6	PU Problem Classification Algorithm Based On Support Vector Machine
7	Research On Svm Based On Large-Scale Training Set
8	The Research Of Imbalanced Data Classification Algorithm Based On Support Vector Machine
9	The Research Of Classification Algorithm Based On Support Vector Machine
10	Research On Patent Value Classification Prediction Model Based On Machine Learning