Automatic Classification Of Chinese Patent Text Based On Deep Learning

Posted on:2022-06-08

Degree:Master

Type:Thesis

Country:China

Candidate:P Cheng

Full Text:PDF

GTID:2518306734487694

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

As an important carrier of all kinds of information,patent text has important research significance for its automatic classification.With the advent of the era of big data,people are paying more and more attention to intellectual property.As more and more patent documents,we are faced with the problem is how to correct the patent document retrieval and classification management,at present mainly to classify patent is given priority to with artificial,not only time-consuming,also cannot guarantee accuracy of classification,so we urgently need to classification of automation of patents to improve the fast classification and review of the patent.In view of the above situation,this paper mainly does the following research on patent text classification:Firstly,a machine learning method based on Word2Vec and logistic regression for Chinese patent text classification model is proposed.This method uses the word vector generated by Word2Vec to represent the patent text,and then combines with logistic regression model to carry out machine learning and training on the text corpus combined with the patent description and abstract,so as to realize the automatic classification of patent text.In the experimental research,the proposed machine learning method can achieve good classification effect,among which the classification accuracy of individual categories reaches 83.6%.Moreover,compared with the knearest neighbor algorithm,the model has significant improvements in precision,recall rate and F1-value.In addition,compared with one-hot encoding and TF-IDF,Word2Vec model can significantly improve the classification effect,but Word2Vec model can not solve the polysemy problem.Therefore,in order to solve the polysemy problem that Word2Vec model could not be used for text representation,a Chinese patent text classification method based on BERT model was proposed.This method can capture sentence word order information,context and grammar information,and use Transformer Encoder to obtain dynamic word vector,that is,different word vector expression can be recognized in different contexts,thus improving the ability of word vector representation.The experimental results show that the BERT-based model has better precision,recall rate and F1-value than the Word2Vec-based model.The patent text classification method proposed in this paper can provide some reference value for the research and application of patent text automatic classification.

Keywords/Search Tags:

Chinese patent, text classification, BERT, Word2Vec, logistic regression, deep learning

PDF Full Text Request

Related items

1	The Research On Multi-Classification Of Emotions Based On Chinese Micro-blog Text
2	The Study Of Automatic Chinese Patent Classification Based On Deep Learning Theory And Method
3	Research And Design Of Chinese Patent Text Classification Based On Deep Learning
4	Study On Hierarchical Text Categorization Of Patent Data Based On Fuzzy Logistic
5	Research On Chinese Text Feature Classification Based On Distributed Framework
6	Research On Chinese News Classification Algorithm Based On Deep Learning
7	Research On Key Technologies Of Chinese Text Classification Based On Deep Learning
8	The Study Of Text Classification And Retrieval For Chinese Patent
9	Research On Hierarchical Text Emotional Classification Based On Deep Learning
10	Research On Short Text Classification Technology Based On Deep Learning