| Patents are the world’s largest source of technical information,containing more than90% of the world’s scientific and technological information;as the core element of intellectual property,it embodies the development level of science and technology.Resources are also an important driving force that can promote the popularization of human scientific and technological achievements and promote scientific and technological innovation.Based on the analysis of patents,we can understand the current development status of the field,grasp the hotspots and trends of technological development in time,so as to better discover technological opportunities and make strategic layouts,thereby effectively improving the competitiveness of enterprises.However,the workload of patent writing is huge.In addition to clearly describing technical information,it also contains a lot of economic and legal information.In addition,the writing language must be strictly standardized,and the use of sentences and words must be rigorous to prevent infringement.This makes the preparation of patents always a difficult problem for patent applicants to apply for patents.This paper deeply explores the characteristics of patent text data on the self-built new energy vehicle patent data set,and aims at assisting writing to deeply study the three key technologies of keyword extraction technology,text classification technology and similarity calculation related to assisting writing.The related technologies of patent assisted writing proposed in this paper are all targeted technical methods.The experimental results show that the method in this paper has achieved better experimental results on patent data sets,and has a good effect on patent deep processing,patent infringement and the construction of patent assisted writing system.Downstream research is important.The main innovations and contributions of this paper are as follows:(1)A method for extracting Chinese patent keywords that integrates sememe and Wubi features is proposed.By fusing the BERT vector with the sememe vector and the Wubi vector,the semantic features and glyph features of the input sequence are captured,and then the feature vector is assigned with the word frequency as the weight,which enriches the global information of the feature representation,and finally passes the Bi LSTM-CRF model.to complete the keyword extraction task for patent datasets.The experimental results show that the keywords extracted by the method in this paper are more representative of the original text,and the effect is better than other baseline models.The F_1 value of the method in this paper reaches 84.90%.(2)A Chinese patent text classification method based on feature fusion is proposed.Update the vocabulary by extracting new important proper nouns,fuse the sentence vector obtained by BERT pre-training with the important proper noun vector,and use the TF-IDF value of the proper noun as the weight value to further perform feature fusion with the sentence vector,which improves the problem of unsatisfactory classification results caused by the existence of a large number of unregistered words in patent texts.The experimental results show that the method in this paper is better than other baseline models in the classification of patent texts.The F_1 value of this method reaches 81.23%.(3)A WIA-MA-Tree-LSTM similarity calculation method combining patent external features is proposed.By combining the multi-head attention mechanism with Tree-LSTM,the patent text(W)is represented by a dependency tree,and the external features of the patent are introduced based on it,and the features of one piece of text in the text pair are taken as another piece of text.The external feature input of.First,the first layer of MATree-LSTM introduces the information of the inventor(I)and applicant(A)of the patent as external features through the multi-head attention mechanism,and the second layer of MATree-LSTM uses the feature of a piece of text in the text pair as the The external feature input of another piece of text acts on the child nodes of Tree-LSTM,and assigns different weights to the child nodes to calculate the similarity value of the text pair,so as to complete the similarity calculation task of the text pair.It makes the text pair pay more attention to the similar parts while considering the hidden association contained in the patent application information,which improves the problem of low accuracy caused by the lack of semantic structure information in the existing methods.The experimental results show that the method proposed in this paper is more accurate in calculating the similarity of patent datasets and outperforms other baseline models.The Pearson correlation coefficient of the method in this paper reaches 0.71,and the mean square error is as low as 0.29.To sum up,according to the characteristics of patent data sets in the field of new energy vehicles,this paper applies deep learning methods to deeply study the key technologies in patent-assisted writing tasks,including keyword extraction,text classification and similarity of patent basic information and abstracts.degree calculation techniques.The model described in this paper solves some problems in patent-assisted writing technology to a certain extent,and is of great significance to the downstream research of patent-assisted writing.The research results of this paper are of great significance for improving the efficiency of patent application writing,effectively avoiding patent infringement,effectively using research resources,and avoiding repetitive technical research. |