Font Size: a A A

Patent Problem Understanding Model And Algorithm Research

Posted on:2023-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:H C ZhangFull Text:PDF
GTID:2568306794486994Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,intellectual property protection has received increasing attention from countries around the world.The number of patents,an important form of intellectual property,is increasing year by year.In order to make better use of the increasing number of patents,it is necessary to first understand the patent problem.The patent problem include two aspects,i.e.,the issue sentence in the patent text and the question of the patent in the patent Q&A.A patent,as a complex technological text,is built around issue sentence.Since the issue sentence describes the problem that the patent is intended to solve,identifying the issue sentence is the key to understanding the patent problem.Currently,issue sentence recognition in patents by computer faces two challenges: the scarcity of training datasets related to patent issue sentences and the inability to understand and recognize issue sentences accurately.In the patent text,the issue sentences mainly exist in the "background technology" paragraph,so the paragraph is divided into sentences to obtain the candidate issue sentences(the candidate issue sentences contain issue sentences and non-issue sentences),and the candidate issue sentences can be used as the training data set.In addition,this paper proposes a patent issue recognition model based on graph convolutional neural network.The model extracts two sets of features from each candidate issue sentence,i.e.,extracts the transitive conjunctions and the three lexical words of adverbs,adjectives and verbs with emotional color in the sentence as one set of features,and extracts the features of association information between the sentence and the patent claims according to the structural characteristics of the patent text as the other set of features.In addition to understanding issue sentences in patent texts,understanding the questions asked about a patent is another problem faced in the field of patent problem understanding.Because people often want to know not only what problem a patent solves,but also how their individual needs are addressed in the patent text,this requires Q&A models in the patent domain.Research on patent Q&A models is currently facing three challenges: the scarcity of publicly available datasets,the fact that question texts for patent texts are usually long texts that are not easily interacted with,and the fact that patent texts cover a wide range of domains with widely varying words that cannot be accurately encoded.In order to solve the problem of sparse datasets,this thesis refers to the widely used multi-round Q&A public dataset Co QA to construct a multi-round Q&A dataset in the patent domain.For each patent,five rounds of Q&A are designed to constitute the patent Q&A dataset.In addition,this thesis proposes a patent QA model based on attention mechanism,which overcomes the challenge of not easy interaction by using a combination of attention mechanism and Gated Recurrent Unit to preserve the information between questions and distant words in the patent text without losing location and context information;the pre-training model BERT encodes words while fine-tuning the word vector by word-level attention mechanism,thus overcoming the challenge of The challenge of inaccurate encoding is overcome.On the patent dataset,the proposed patent issue recognition model based on graph convolutional neural network and the patent QA model based on attention mechanism are compared with the comparison model in this thesis,and the experimental results show that the model proposed in this thesis outperforms the comparison model.
Keywords/Search Tags:patent problem understanding, issue Recognition, question answering model, graph convolution neural network, attention mechanism, natural language processing
PDF Full Text Request
Related items