Font Size: a A A

Parallel Min-Max Modular Support Vector Machine With Application To Patent Classification

Posted on:2010-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z F YeFull Text:PDF
GTID:2178360275470257Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Large scale machine learning problems always restrict real application of many ma-chine learning algorithms. Those problems are pretty common, such as patent classification.Even for efficient algorithms such as Support Vector Machine, large-scale problems are stilltough. It is quite feasible to breakthrough single constraints,and use parallel computingenvironment to solve these large scale problems.Min-Max Modular Support Vector Machine(M3-SVM) is an"divide and conquer"based algorithm which can effectively solve large-scale problems. It decompose large-scaleproblems into many subproblems, and reorganize classifiers for those subproblems, to givea solution to the original large-scale problem. This algorithm is born parallel.In this work, we analyze the realization of parallel M3-SVM, and its training and test-ing time complexity. Based on the original min-max modular parallel testing algorithm, weproposed pipeline style Symmetric Classifier Selection(SCS), Asymmetric Classifier Selec-tion(ACS), and Decision tree Classifier Selection(DCS) algorithms. Experimental resultsshow that pipeline style classifier selection algorithms significantly accelerate the testingstep. In the problem decomposition, we proposed a new"centroid connection"based algo-rithm, which turns out to be most effective when no prior knowledge is used.In application, we solve large-scale text classification problem with parallel M3-SVM.In particular, solve Japanese patent classification under computer cluster environment. Com-paring M3-SVM and traditional SVMlight, we discover that M3-SVM is more efficient andeffective.In addition, because M3-SVM can decompose an imbalanced problem into many bal-anced subproblems, it can solve class imbalance problems effectively. We did an systematiccomparison of M3-SVM with some popular algorithms on three completely different imbal-ance problems, and discover that M3-SVM is more effective than cost sensitive learning andSMOTE re-sampling for imbalance problems.
Keywords/Search Tags:Support Vector Machine, Min-max Modular SVM, large-scale learning problem, class imbalance problem, parallel machine learning, patent classification, classi-fier selection algorithm, pipeline
PDF Full Text Request
Related items