The Study Of Text Classification And Retrieval For Chinese Patent

Posted on:2012-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:S L Dan

Full Text:PDF

GTID:2218330368987994

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, patent knowledge attracts many people's attention, and there is more and more research on patent analysis and mining. The technology of machine learning is the technical support for patent mining. Patent classification and retrieval is the basis for patent knowledge discovery and the necessary tools for innovative designers to work. Patent data has its unique structure and professional, now Patent Classification mainly faces machine segmentation, high dimension, and low classification issues, and patent retrieval faces the problems of large volume of data, low retrieval efficiency, and professional-used. To solve these problems, this paper studies Chinese patent classification and retrieval to improve the efficiency, further to mining the patent knowledge.To address those problems, a new patent classification model based on semantic disambiguation and manifold dimension reduction methods and a patent retrieval model based on the index pool are proposed in this paper. Based on these issues, we want to apply the theory of Engineering Semantic Web to solve the problems of multi-conflicts design in innovative engineering design, and use the methods of mining high-dimensional time series data for patent data; it will be very conducive to innovation design.In the traditional procedure of text classification, machine segmentation is used for extracting features, which does not reflect the patent data deep semantic knowledge. In the patent classification model based on semantic disambiguation and manifold dimension reduction, the semantic dictionary is introduced for eliminate noise words and construct semantic features, by this the relative dimensions of the vector text is also reduced. High dimension is another problem faced by patent classification, and we apply the manifold learning algorithms to reduce it. On the one hand it tries to find the intrinsic dimension, and the other to improve classification efficiency through the dimensionality reduction. By experiments, we prove that the two strategies can effectively improve the retrieval efficiency.There are many studies on multiple indexing methods to improve retrieval efficiency. However, for different applications (multi-language text, images, video, data, etc.) different multi-index strategies are given to improve the retrieval efficiency effectively; these strategies are useful and limited to their applications, they cannot be applied to other applications. What's more, there is not any study about the index maintenance and management strategies. To solve these problems this paper presents a dynamic index pool model, and gives the theoretical basis of building and optimizing indexes. The index pool model is apply the pooling technology to the management of multi-index, to optimize the query index structure constantly according user query feedback, in which it provides users with more efficient services, on the other hand the system load is effectively reduced. The validity of the index pool model is verified by using it for patent retrieval.Based on the researeh work about patent classification and retrieval, we propose one novel method to the problem of multi-conflict design, which introduce the Engineering Semantic Web. The innovation and design process is to solve conflicts in real-life, we use engineering semantic web to analyze the problem of engineering design with multiple conflicts and to solve it. For massive patent information, we make use of high-dimensional time series data mining methods to analyze the distribution of patents, and to achieve the cross-cutting, systematic, multi-pattern matching between invention principles and patent instances.

Keywords/Search Tags:

Patent Classification, Patent Retrieval, Semantic Disambiguation, Manifold Dimension Reduction, Index Pool

PDF Full Text Request

Related items

1	Research Of Patent Document Analysis And Retrieval Based On Latent Semantic Analysis
2	Research Of Design Patent Images Classification
3	Research On Dynamic Monitoring And Analysis System Of Patent Information Based On Ontology
4	Application Of Uncertain Semantic Retrieval In Patent Intelligent Service Platform
5	Research And Implementation Of Patent Information Retrieval Experimental System
6	Research And Implementation Of Patent Mining Methods With Distinguishable Validity
7	Research On Patent Classification Technology Based On Latent Semantic Analysis
8	Research On Patent Retrieval And Core Patent Identification Methods
9	Construction Of A Vertical Search Engine For Patent Information
10	Research And Implementation Of Fusing Ontology And Users' Interests In Patent Information Retrieval System