Font Size: a A A

Automatic Classification Of Chinese Patent Text Based On Deep Learning

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:P ChengFull Text:PDF
GTID:2518306734487694Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As an important carrier of all kinds of information,patent text has important research significance for its automatic classification.With the advent of the era of big data,people are paying more and more attention to intellectual property.As more and more patent documents,we are faced with the problem is how to correct the patent document retrieval and classification management,at present mainly to classify patent is given priority to with artificial,not only time-consuming,also cannot guarantee accuracy of classification,so we urgently need to classification of automation of patents to improve the fast classification and review of the patent.In view of the above situation,this paper mainly does the following research on patent text classification:Firstly,a machine learning method based on Word2Vec and logistic regression for Chinese patent text classification model is proposed.This method uses the word vector generated by Word2Vec to represent the patent text,and then combines with logistic regression model to carry out machine learning and training on the text corpus combined with the patent description and abstract,so as to realize the automatic classification of patent text.In the experimental research,the proposed machine learning method can achieve good classification effect,among which the classification accuracy of individual categories reaches 83.6%.Moreover,compared with the knearest neighbor algorithm,the model has significant improvements in precision,recall rate and F1-value.In addition,compared with one-hot encoding and TF-IDF,Word2Vec model can significantly improve the classification effect,but Word2Vec model can not solve the polysemy problem.Therefore,in order to solve the polysemy problem that Word2Vec model could not be used for text representation,a Chinese patent text classification method based on BERT model was proposed.This method can capture sentence word order information,context and grammar information,and use Transformer Encoder to obtain dynamic word vector,that is,different word vector expression can be recognized in different contexts,thus improving the ability of word vector representation.The experimental results show that the BERT-based model has better precision,recall rate and F1-value than the Word2Vec-based model.The patent text classification method proposed in this paper can provide some reference value for the research and application of patent text automatic classification.
Keywords/Search Tags:Chinese patent, text classification, BERT, Word2Vec, logistic regression, deep learning
PDF Full Text Request
Related items