Machine Learning Based Patent Categorization

Posted on:2009-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:X L Chu

Full Text:PDF

GTID:2178360242976752

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Automatic patent classification is of great importance. More than 300,000 patents areissued annually, but current patent classification is mainly rely on human experts, and thewhole process would spend a lot of manpower and resources. In addition, patent classifica-tion is the basis of the patent analysis. By effectively analyzing patents, we can get a lot ofvaluable information, such as the technology trend in a domain, the market competitors'R& D strategy and direction. However patent classification is a large-scale, hierarchical struc-ture, multi-labels and imbalanced text classification problem, which can not be well solvedby traditional machine learning methods.Support vector machine (SVM), which aims at minimizing the upper bound of the gen-eralization error, has been successfully applied to various pattern classification problems,due to its powerful learning ability and good generalization performance in comparison withother classification methods. However, SVM requires to solve a quadratic optimization prob-lem and its training time is about quadratic to the number training samples. Hence, it is hardto learn large-scale problems. To deal with these problems, a parallel method for trainingSVMs, named min-max-modular SVMs (M3-SVMs),has been proposed by B.L.Lu and hiscolleagues. M3-SVMs decomposes the original problem into a serials of much smaller andindependent subproblems, which can be learned in parallel on cluster. The learning resultscan be merged by two basic rules, thus get the solution of the original problem.This paper proposes the use of the M3-SVMs to solve the patent classification problem.Based on M3-SVMs, we proposes a prior knowledge based problem decomposition method.Using the patents'time information and taxonomy, we can achieve effective decompositionof the problem, making decomposition result approximate to the original data distribution.Traditional machine learning algorithms, such as SVM, depends on training parametergreatly. In order to achieve its best performance, we need to use the optimal training param-eters. However, the process of parameter tuning need to spend a lot of time, especially forlarge-scale problems. For M3-SVMs, parameter tuning can be avoided, because parameterdependance is weakened by problem decomposition. In addition, M3-SVMs also supportincremental learning. This feature is significant for patent classification system. By incre-mental learning, patent classification system can learn new patents effectively by utilizingthe learned modules, therefore realize rapid system update. We performed patent classification experiments on NTCIR patent data set. We com-pared the performance of several problem decomposition strategies, as well as the perfor-mance of M3-SVMs and SVM. The experimental results show that the prior knowledge baseddecomposition strategy achieves the best performance. M3-SVMs outperform convectionalSVM on both training time and generalization performance. In addition, we demonstratedthe incremental learning ability of M3-SVMs by simulation.

Keywords/Search Tags:

min-max modular network, support vector machine, large-scale classifi-cation problem, patent classification

PDF Full Text Request

Related items

1	Parallel Min-Max Modular Support Vector Machine With Application To Patent Classification
2	Large-scale Patent Classification Based On Parallel Machine Learning
3	Research On Large Scale Sparse Support Vector Machines
4	Research On Ensemble Learning
5	Large Scale Classification Algorithms Based On Clustering Feature Trees
6	Research On Patent Value Classification Prediction Model Based On Machine Learning
7	The Research On Large-scale Support Vector Machine And The Applications
8	Support Vector Machine For Solving Classification Problem And Its Improvement Strategies
9	Research On Svm Based On Large-Scale Training Set
10	PU Problem Classification Algorithm Based On Support Vector Machine