| As the Chinese Government vigorously promotes a strategy of revitalizing the country through science and education,its market economy is booming,and the number of patents is rising substantially.Currently,patents have become an essential competitive strategy for enterprises,regions,and even countries.Understanding the current status of technological development in the field is the prerequisite for scientific and technological innovation.In addition,mining effective information helps scientific researchers improve existing technologies and discover innovative breakthrough points.Patent abstracts,which act as a brief summary and have the same amount of information as the article,is an independent short essay.Therefore,this thesis focuses on the area of patents.It carries out classification task research on patent abstracts and takes the widely used and effective model at the present stage as the starting point to improve the performance of the deep learning model.The specific content is as follows:Building patent-target knowledge graph.The quality of preliminary knowledge directly affects the effect of the classification model.The data of the short text of the patent abstract contains less feature information and lacks available preliminary knowledge.To this end,this thesis proposes to use a knowledge graph to expand the feature information.Through analyzing the data characteristics of patent websites,this thesis defines the entities of the knowledge graph and then defines the relationship between entities manually.After accessing the triple data of the knowledge graph,Neo4 j,a graph database,is used to store the constructed graph.Proposing a short text classification model of patent abstract based on the integration of knowledge graph and BERT-CNN.The BERT(Bidirectional Encoder Representations from Transformers)model is used to perform the feature extraction.At the same time,Trans H is applied to represent entities and relationships in learning patent knowledge graph,and entity information is expressed as a dense lowdimensional entity vector.To expand the feature information of the text,the entity vector of the patent knowledge graph will be connected with the feature vector output by the BERT model.In the end,the CNN(Convolution Neural Network)is needed to access the local feature information and output the classification results of the abstract.This thesis proves the effectiveness and superiority of the short text classification model of patent abstracts that combines knowledge graph with BERT-CNN by using the Pytorch,a deep learning framework,to build a short text classification model of patent abstracts,and comparing the model with the control group model,the commonly used word vector technology,and the classification model proposed by other scholars.In the case that the model achieves better results,the ALBERT model is used instead of BERT to obtain semantic information to reduce the number of parameters of the model.The ALBERT model overcomes the obstacles of the expansion of the pretraining model through two parameter reduction techniques.Experiments on different data sets have proved that the ALBERT model can significantly reduce the number of parameters of the model while maintaining the classification effect,thus speeding up the training of the classification model.Besides,the integration of the knowledge graph and the deep learning of the short text classification model of the patent abstract proposed in this thesis has achieved desirable results on different data sets. |