Font Size: a A A

Content Analysis Based Patent Mining Research

Posted on:2009-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:F F CaoFull Text:PDF
GTID:2178360308479378Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the recent decade, Patent Mining has experienced a prominent flourish. In the past, much of the focus for patent search and retrieval has been from the database community, but in recent years, it has been from Natual Language Processing (NLP) technology and Information Retrieval (R) community. The improvement of Patent Mining can be attributed to the two factors:the boom of statistical machine learning approaches provided new methodology for solving Patent Mining and Natual Language Processing tasks; the improvement of Natual Language Processing and Information Retrieval technology. The platform of International Patent Evaluation and workshop provides a forum in which researchers and practitioners from relevant communities can share their ideas, approaches, perspectives, and experiences from their work in progress.In this paper, we research the content characteristic of the patent text and data statistic based on different patent corpus. Then we analyse the difficult problem of Patent Mining task. Based on the Natual langugage processing Patent mining task has several questions:(1) Scalar of patent corpus is huge, there are almost several million patent samples; (2) Content of Patent text refers to all technology domains. The phenomenon of cross-cutting issue between domains is common, which is adverse to text classification; (3) The data distribute of the patent text on International Patent Classification (IPC) classification system is imbalance and train data in main class is insufficiency; (4) The classification system of patent is different from that of the traditional text classification, especially IPC system has large scale number of classes which is Hierarchy; (5) Patent text has multi-classclassification tag.This dissertation focuses on how to resolve the main problem of Patent Mining task and research technology to improve the performance of patent mining system. We propose some models and methods for patent mining task based on the previous works. We focus on the following issue:(1) Using the frame of text classification based on NLP technology to process the Patent Mining task.(2) Using inverted indexing to improve the speed of text classification, which is common technology Information Retrieval community.(3) Propose class clustering method to improve data imbalance problem.(4) Using several similarity calculation methods for Patent Mining task. (5) Propose several Ranking methods for class decision-making process, especially, the method based on log-linear and the system combine method based on Rank-SVM model.In this paper, all the research work bases on Patent Mining Evaluation task of NTCIR-7, and build the creditable system for patent mining task used U.S. patent and the English translation of the Japanese patent data.
Keywords/Search Tags:Patent Mining, text classification, similarity calculation, Ranking
PDF Full Text Request
Related items