Content Analysis Based Patent Mining Research

Posted on:2009-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:F F Cao

Full Text:PDF

GTID:2178360308479378

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In the recent decade, Patent Mining has experienced a prominent flourish. In the past, much of the focus for patent search and retrieval has been from the database community, but in recent years, it has been from Natual Language Processing (NLP) technology and Information Retrieval (R) community. The improvement of Patent Mining can be attributed to the two factors:the boom of statistical machine learning approaches provided new methodology for solving Patent Mining and Natual Language Processing tasks; the improvement of Natual Language Processing and Information Retrieval technology. The platform of International Patent Evaluation and workshop provides a forum in which researchers and practitioners from relevant communities can share their ideas, approaches, perspectives, and experiences from their work in progress.In this paper, we research the content characteristic of the patent text and data statistic based on different patent corpus. Then we analyse the difficult problem of Patent Mining task. Based on the Natual langugage processing Patent mining task has several questions:(1) Scalar of patent corpus is huge, there are almost several million patent samples; (2) Content of Patent text refers to all technology domains. The phenomenon of cross-cutting issue between domains is common, which is adverse to text classification; (3) The data distribute of the patent text on International Patent Classification (IPC) classification system is imbalance and train data in main class is insufficiency; (4) The classification system of patent is different from that of the traditional text classification, especially IPC system has large scale number of classes which is Hierarchy; (5) Patent text has multi-classclassification tag.This dissertation focuses on how to resolve the main problem of Patent Mining task and research technology to improve the performance of patent mining system. We propose some models and methods for patent mining task based on the previous works. We focus on the following issue:(1) Using the frame of text classification based on NLP technology to process the Patent Mining task.(2) Using inverted indexing to improve the speed of text classification, which is common technology Information Retrieval community.(3) Propose class clustering method to improve data imbalance problem.(4) Using several similarity calculation methods for Patent Mining task. (5) Propose several Ranking methods for class decision-making process, especially, the method based on log-linear and the system combine method based on Rank-SVM model.In this paper, all the research work bases on Patent Mining Evaluation task of NTCIR-7, and build the creditable system for patent mining task used U.S. patent and the English translation of the Japanese patent data.

Keywords/Search Tags:

Patent Mining, text classification, similarity calculation, Ranking

PDF Full Text Request

Related items

1	Course Similarity Calculation Using Efficient Manifold Ranking
2	Research On Key Technologies Of Patent-Value Mining Based On Hadoop
3	Technology Innovation Methods Based On Patent Citation Network Analysis And Text Mining Of Patents
4	Research On Chinese Patent Auxiliary Writing Technolog
5	Research On Core Patent Recognition Method Based On Text Data Mining
6	The Study On Ranking And Similarity Calculation In Information Retrieval
7	Research And Application Of Text Mining In The Patent Automatic Classification Based On Neural Network
8	Studies On Key Techniques Of Text Classification And Mining For Specific Domains
9	Application Of Text Mining In Patent Literature Analysis
10	Research Of Patent Text Classification Algorithm Based On KNN