Font Size: a A A

Large-scale Patent Classification Based On Parallel Machine Learning

Posted on:2012-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q KongFull Text:PDF
GTID:2178330338984137Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Many practical problems in today's society can be considered as a large-scalepattern recognition problem, such as the data mining of web and the analysis of thepassengers of transport system. However, for large-scale problems, lots of conven-tional classifiers are hard to overcome it even if efficient algorithms such as SVM. Onthe other hand, more and more computing resources are available. Using the abundantlarge-scale parallel computing resources to solve the real-world problem is a feasiblemethod.Patent text classification is a large-scale, imbalanced patent classification problemwith high practical significance, such as analyzing the trend of a field of technology. Inorder to solve practical problems such as the patent classification, we use the algorithmbased on parallel structures based on the abundant computing resources , in order toachieve effective model for classification of the original problem. Bao-liang Lu and hiscollaborators have proposed a parallel network, called the Min-Max modular network(M3), which is based on"divide and conquer"to solve large-scale problems.A single large-scale problem is decomposed into a large number of small-scaleproblems in order to achieve parallelism in M3. These small modules are simple andeasy to solve, and independent of each other, and finally sub-solution of the problem.We will merge the modules by rules to get the solution of the original problem.The precision is the most important in classification problem. In order to solvethe problem, we used asymmetric selection algorithm, symmetric selection algorithmand decision tree selection algorithm. Based on them, we proposed assistant classifiermodule selection strategy (ACMSS). Experiments show that ACMSS can effectivelyimprove the classification performance.We use a variety of decomposition strategies and combination methods. Com- pared with the conventional support vector machine, the ACMSS algorithm combinedwith the prior knowledge decomposition strategy provides much better performance.Assistant classifier module selection strategy has generalization ability and strongadaptability. It can compute the weights of sub-classifiers automatically witch hasbeen proved by a large number of experiments.
Keywords/Search Tags:Min-Max modular network (M~3), large-scale textclassification, parallel machine learning, patent classification, assis-tant classifier module selection strategy(ACMSS)
PDF Full Text Request
Related items