Research On Multi-granularity Patent Text Clustering Fusing Attribute Extraction

Posted on:2016-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:D P Sun

Full Text:PDF

GTID:2308330461476537

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the development of science and technology, more and more countries and companies pay attention to protecting research findings. As a form of intellectual property protection, patents get more and more attention from people. Due to the late start of Chinaâ€™s economy development, compared with foreign companies, most domestic enterprises are lack of understanding patents in the early stages of development. Therefore, raising awareness of patent protection and enhancing the ability of patent analysis, become more and more important for the development of domestic enterprises. The patent text consists of structured and unstructured data and we analyze the latter one. There are many parts of unstructured data including the title, the abstract, patent claims and so on. The title often describes the key technology of patents. The abstract summarizes the content of the patent. Both of them contain a wealth of patent information. Therefore, we choose the title patents and patent abstract for the study.Taking patent abstracts as the study object, we propose an information extraction method based on conditional random fields(CRFs). Firstly, regarding attribute and attribute value as the named entity, this thesis trains CRFs model by using the training set. And then this thesis uses the model to extract attributes and attribute values from the test set. Secondly, we utilizes rules to predict the relation between attributes and attribute values. The accuracy, recall and F-score of predicting are 80.8%,81.2% and 81.0% respectively. Finally, using the extraction results, this thesis analyzes patents and compares the same type patents.On the patent title and patent abstract, this thesis proposes one patent text clustering method. Based on the sequence label technology, we finish the attribute and values extraction. We represent the abstract information with two different granularities, the text and the attribute-value. Using the distributed representation, this thesis represents the information. With different weights, we fuse the patent title, abstract and attribute-value linearly. Based on the spectral clustering, this thesis clusters the patent text. By examining the results of accuracy, recall and F-value, this thesis proves the feasibility and effectiveness of the proposed method, demonstrating the value of work.

Keywords/Search Tags:

Attribute Extraction, Conditional Random Fields, Multi-granularity, Distributed Representation

PDF Full Text Request

Related items

1	Research On Social Network Character Attribute Extraction Method Based On Statistical Learning
2	Research And Implementation Of Personal Attribute Extraction In Chniese
3	Metadata Extraction Based On Third-order Conditional Random Fields
4	Research On Personnel Resume Intelligent Extraction System Based On Conditional Random Fields
5	Research Of The Automatic Metadata Extraction Based On The Conditional Random Fields
6	Information Recognition And Extraction From Chinese Periodical Papers Based On Conditional Random Fields
7	SAR Image Change Detection Based On Conditional Random Fields
8	Research On Online Detection Method Of Reputation Fraud Campaign Based On Conditional Random Fields
9	Web Information Extraction Research Based On Conditonal Random Fields
10	Hierarchical Information Extraction From Research Papers Based On Conditional Random Fields