Font Size: a A A

Machine Learning Based Patent Categorization

Posted on:2009-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChuFull Text:PDF
GTID:2178360242976752Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic patent classification is of great importance. More than 300,000 patents areissued annually, but current patent classification is mainly rely on human experts, and thewhole process would spend a lot of manpower and resources. In addition, patent classifica-tion is the basis of the patent analysis. By effectively analyzing patents, we can get a lot ofvaluable information, such as the technology trend in a domain, the market competitors'R& D strategy and direction. However patent classification is a large-scale, hierarchical struc-ture, multi-labels and imbalanced text classification problem, which can not be well solvedby traditional machine learning methods.Support vector machine (SVM), which aims at minimizing the upper bound of the gen-eralization error, has been successfully applied to various pattern classification problems,due to its powerful learning ability and good generalization performance in comparison withother classification methods. However, SVM requires to solve a quadratic optimization prob-lem and its training time is about quadratic to the number training samples. Hence, it is hardto learn large-scale problems. To deal with these problems, a parallel method for trainingSVMs, named min-max-modular SVMs (M3-SVMs),has been proposed by B.L.Lu and hiscolleagues. M3-SVMs decomposes the original problem into a serials of much smaller andindependent subproblems, which can be learned in parallel on cluster. The learning resultscan be merged by two basic rules, thus get the solution of the original problem.This paper proposes the use of the M3-SVMs to solve the patent classification problem.Based on M3-SVMs, we proposes a prior knowledge based problem decomposition method.Using the patents'time information and taxonomy, we can achieve effective decompositionof the problem, making decomposition result approximate to the original data distribution.Traditional machine learning algorithms, such as SVM, depends on training parametergreatly. In order to achieve its best performance, we need to use the optimal training param-eters. However, the process of parameter tuning need to spend a lot of time, especially forlarge-scale problems. For M3-SVMs, parameter tuning can be avoided, because parameterdependance is weakened by problem decomposition. In addition, M3-SVMs also supportincremental learning. This feature is significant for patent classification system. By incre-mental learning, patent classification system can learn new patents effectively by utilizingthe learned modules, therefore realize rapid system update. We performed patent classification experiments on NTCIR patent data set. We com-pared the performance of several problem decomposition strategies, as well as the perfor-mance of M3-SVMs and SVM. The experimental results show that the prior knowledge baseddecomposition strategy achieves the best performance. M3-SVMs outperform convectionalSVM on both training time and generalization performance. In addition, we demonstratedthe incremental learning ability of M3-SVMs by simulation.
Keywords/Search Tags:min-max modular network, support vector machine, large-scale classifi-cation problem, patent classification
PDF Full Text Request
Related items