Font Size: a A A

Research On Automatic Generation Technology Of Patent Specification Abstract Based On Hybrid Model

Posted on:2024-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhengFull Text:PDF
GTID:2568307094457434Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The patent specification includes a detailed description and explanation of the technical features,structure and functions of the invention or utility model to be protected,which is of great research value.As the number of patent applications continues to grow,how to accurately retrieve the required technical information from massive patent databases and quickly understand the innovation,practicability,and commercial value of a certain patented technology in the analysis of patented technology is the first need to be solved.The role of the abstract of the patent specification is to provide readers with a brief and general description of the patent,so that they can quickly understand the technical content and innovation points of the patent.At present,most of the research on automatic abstract generation is carried out on short texts,and most of the generated abstracts are single-sentence abstracts.Since the text of the patent specification is longer and contains more complex concepts and technical terms,the existing model cannot fully summarize the core content of the patent specification and cannot meet the needs of researchers for patent abstracts.Based on the above background,this thesis takes the patent specification and its abstract as the research object,and proposes two automatic generation models of patent specification abstracts based on hybrid model:(1)Patent specification abstract generation method based on Patent SNU.The model first uses the improved Summa Ru NNer-based extraction model to extract key sentences from the patent specification to generate transitional documents.Then,with the transitional document as input,the transitional document is optimized using the abstraction model based on NEZHA-Uni LM,and then an accurate and concise abstract of the patent specification is obtained.At the same time,in the fine-tuning stage of the abstraction model,in order to avoid Softmax over-learning,sparse Softmax is introduced to improve the summary generation effect.(2)A method for generating abstracts of patent specifications based on the DBL model.The model uses a DGCNN-based extractive model to obtain a subset of patent specifications,and then inputs the subset and reference abstracts into a generative model based on the BART-LED model for training,and finally obtains the abstract of the patent specification.In the extraction model,first use the non-greedy algorithm to obtain the sequence-labeled corpus,then use " Roberta+average pooling" to generate sentence vectors,and then use the DGCNN model to build the main body of the labeling model.In the abstraction model,the pre-trained model BART is first used to initialize the generative model,and then the LED(Longformer-Encoder-Decoder)model is used for fine-tuning.The model also introduces a BIO Copy mechanism to ensure that the generated abstract is faithful to the original text.Using the patent specification and its abstract data collected from the patent star search system,build a data set,and verify the effectiveness of the model on the data set.Three indicators ROUGE-1,ROUGE-2 and ROUGE-L were selected to automatically evaluate the quality of abstracts,also manual evaluation was carried out.Experimental results show that the automatic abstract generation method proposed in this thesis can generate patent specification abstracts with complete structure and fluent language.Taking the ROUGE score as the evaluation index,compared with other benchmark models,the Patent SNU model has increased the scores of the three evaluation indexes of ROUGE-1,ROUGE-2,and ROUGE-L by at least 5.51%,4.75%,and 4.55%;the scores of the DBL model on the three evaluation indicators of ROUGE-1,ROUGE-2,and ROUGE-L have increased by at least 5.58%,6.6%,and 3.95%.
Keywords/Search Tags:Patent specification summarization, automatic text summarization, hybrid abstract generation model, DGCNN model, BIO Copy mechanism
PDF Full Text Request
Related items