Font Size: a A A

Research On The Construction Of Chinese Patent Knowledge Graph

Posted on:2020-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:X R LvFull Text:PDF
GTID:2438330575458811Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Patent is a kind of invention-creation as well as intellectual property rights and is protected by law,including a large number of scientific and technological achievements and innovative technology.With the development of society and the progress of science and technology,people's awareness of the protection of scientific research achievements is gradually increasing.Deep excavation and analysis of the knowledge resources contained in the existing patents is the premise of scientific and technological innovation.This paper intends to construct the patent knowledge map in the field of new energy vehicles to achieve the representation,analysis and mining of patent knowledge in this field,so as to more effectively analyze the relationship between patents and optimize patent retrieval.Knowledge graph is a structured semantic knowledge base,which describes the concepts in the physical world and the relationship between concepts through the form of symbols.It can not only express mass information in a way closer to human cognition,but also provide a better form of organization and management for mass information.The knowledge graph is composed of entity-relation-entity triple and entity and its related attribute-value pairs,which forms a network knowledge structure.In the process of constructing the patent knowledge graph in the field of new energy vehicles,this paper focuses on the extraction method of patent terms,the extraction method of patent terms relationship and the extraction method of patent attribute value.The main contents can be summarized as follows:Patent domain terminology extraction based on multi-feature fusion and BiLSTM-CRF model is proposed.In order to improve the accuracy and recall rate of term extraction results in the Chinese patent domain,approaching from the perspective of deep learning,with part-of-speech and dependency relationships as features,a patent domain term extration model(BiLSTM-CRF)was proposed by combining Conditional Random Fileds(CRF)and bi-directional long short-term memory(BiLSTM)based on a multi-feature fusion.Based on the two explicit characteristics of part of speech and dependency,the double-layer bidirectional LSTM neural network was used to mine the temporal and semantic information in the data,which overcame the disadvantages of the traditional methods,such as weak generality and inability to capture the implicit information in the context as well as addressing the dependency relationship among the output tags through the CRF layer.Experimental results show that this deep learning method is effective in terms of domain term extraction and achieves 89.79% in accuracy and 85.35% in recall.Relation extraction toward patent domain based on keyword strategy and Attention+BiLSTM model is proposed.Category keyword features in each sentence obtained by the improved keyword extraction algorithm(TextRank)are added to the patent text information vectorization.BiLSTM neural work and attention mechanism are employed to mine the temporal information and sentence-level global feature information.Moreover,pooling layer is added to obtain the local features of the text.Finally,we fuse the global features and local features,and output the final classification results through the softmax classifier.The addition of category keywords improves the distinction of categories.Substantial experimental results demonstrate that the proposed model is outper-form than the state-of-art neural model in patent terminology relation extraction and achieves 90.85% in accuracy and 90.64% in recall.Patent attribute value extraction method based on BERT-BiGRU-CRF model is proposed.Firstly,the BERT model is used to train the patent text into a low-dimensional vector matrix.Then,the probability of each tag is calculated by combining the BiGRU model with the temporal information and semantic information in the data.Finally,the optimal tag sequence is obtained by CRF according to the front-back dependency of the tags.The back propagation algorithm is used to optimize the model,and the Dropout algorithm is used to make the model more robust.Experimental results show that the BERT-BiGRU-CRF model is effective in patent attribute value extraction and achieves 85.09% in accuracy and 80.03% in recall.
Keywords/Search Tags:Patent Knowledge Graph, BiLSTM, Attention, BERT, Terminology Extraction, Relation Extraction
PDF Full Text Request
Related items