Font Size: a A A

Chinese Patent Title And Abstract Generation Technology Research

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z CuiFull Text:PDF
GTID:2518306311456274Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As the carrier of information technology,patents contain more than 90%of the latest technologies in the world.Countries and enterprises can use the technical information in published patents for their use,which can generate huge economic value and conduct further research on this basis.Therefore,patents play a very important role in the information age.However,because patent applicants don't want others to know the specific content of the patent before patent examination,they will use some hypernyms instead of the specific content of the patent when writing the patent,which leads to the omission of patent search and the failure to make full use of such huge information resources.With the increasing number of patent applications,the method of patent deep processing by experts is costly and slow,which can not meet the current demand of patent retrieval.In this paper,the patent deep processing model is trained by deep learning method on the Chinese medicine patent data set to regenerate the original patent title and abstract.Experimental results show that the regenerated patent titles and abstracts can contain more key information of patents,which can enhance the readability and easy retrieval of patents and help to make full use of information resources in patents.The main innovations and contributions of this paper are as follows:(1)A method of generating Chinese patent titles by fusing sememe is proposed.On the basis of Skip-gram model,the attention mechanism is used to introduce HowNet sememes to solve the problem of polysemy,and at the same time,the model can get the specific semantics of a word in the current context more accurately.In this paper,we also construct the exclusive sememe for the Chinese patent data set,so as to strengthen the word vector representation of Chinese medicine technical terms in the data set,and input the obtained word vector into the pointer model to generate Chinese patent titles.The experimental results show that the patent title generated by this method can contain the important information needed,and the effect is better than that of the contrast method selected in the experiment.The F1 values of ROUGE-1,ROUGE-2 and ROUGE-L are 0.791,0.695 and 0.796 respectively.(2)A method of extracting patent specification content by BERT is proposed.Firstly,the sentence vector of the instruction is obtained by using the BERT pre-training language model,then the obtained sentence vector is clustered by using the K-means algorithm,and all sentences are divided into eight different categories.Finally,the sentence vector closest to the center point is selected from each category as the final extraction content.Experimental results show that the method proposed in this paper can extract the important contents contained in the manual,and the effect is better than the comparison method selected by experiments.The F1 values of ROUGE-1,ROUGE-2 and ROUGE-L are 0.394,0.207 and 0.324 respectively.(3)A method of generating Chinese patent abstracts by fusing the original facts is proposed.Firstly,the TextRank algorithm is used to extract the key sentences of the original text,and the LTP tool of Harbin Institute of Technology is used to extract the triple of the extracted key sentences,and the extracted triple is taken as the factual description of the original text.The original text and the extracted triplets are input into two Transformer Encoder,and the attention mechanism is calculated with the two encoders in each decoding state of the decoder,and the context vector containing factual information is obtained to guide the generation of abstract.Experimental results show that the method proposed in this paper can generate Chinese patent abstracts that meet the retrieval requirements,and its effect is better than other methods selected by experiments.The F1 values of ROUGE-1,ROUGE-2 and ROUGE-L are 0.471,0.248 and 0.416 respectively.To sum up,according to the characteristics of the Chinese medicine patent data set,this paper uses deep learning method to regenerate patent titles and abstracts.It is of great significance to improve the precision and recall rate of patent retrieval by reducing the labor cost of patent deep processing through the methods of generating Chinese patent titles by fusing sememe,extracting patent specification contents by BERT and generating Chinese patent abstracts by fusing the original facts.
Keywords/Search Tags:Chinese patent, title generation, text summarization, pointer network, content extraction
PDF Full Text Request
Related items