Font Size: a A A

Research On Semantic Analysis Method In Patent Technology Topic Mining

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:C X SunFull Text:PDF
GTID:2518306557477564Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Patent is not only an invention,but also an intellectual property right.It is protected by law and contains a large number of scientific and technological achievements and innovative technologies.According to statistics,95% of innovations in the world will be protected through patent applications.It is the premise of scientific and technological innovation to excavate and analyze the knowledge resources contained in the existing patents.Patent technology topic mining is one of the basic work of patent analysis,which helps patent analysts to quickly understand the summary of patent corpus in a certain field,and can be used in patent classification,patent information extraction and other further patent mining tasks.The traditional LDA(Latent Dirichlet Allocation)model based on bag hypothesis and "co-occurrence of entries" leads to poor readability and interpretability of patent technology topics because the semantic information contained in the subject represented by words is less.Compared with ordinary texts,patent texts contain a large number of technical terms,which usually have richer semantic information than words.Therefore,this thesis constructs a term-based topic model for patent technology topic mining to improve the interpretability and quality of the topic results.The traditional rule-based term extraction method needs to design a large number of rules,which is highly dependent on domain knowledge,and it is difficult to transfer the rules from one domain to another.However,the performance of term extraction method based on statistical machine learning relies too much on the quality of artificial feature formulation.Therefore,the construction of end-to-end term extraction model with less artificial features becomes the mainstream research direction.To sum up,the main research contents of this thesis include the following two aspects:(1)TE?CRF model based on multi feature fusion for patent technology term extraction.From the perspective of semantic analysis,using the word vector with part of speech features and dependency features as the input of the model can effectively utilize the semantic features implied in patent text,and enhance the semantic input of the model to a certain extent;considering that deep learning has the advantages of strong generalization ability and less dependence on artificial feature selection,TE(Transformer Encoder)combined with CRF model is constructed to transform the problem of technical term extraction into sequence tagging.According to the characteristics of sequence tagging,TE with relative position coding is used,and the scaling of attention score is omitted,which makes attention more sensitive and easier to capture the terms in sentences.The experimental results show that this deep learning method is superior to several typical term extraction models in domain term extraction.(2)Patent technology topic mining model based on term.LDA models that express topics with words often produce low quality and poor interpretability.This thesis proposes a term based patent technology topic mining model.Through the improvement of phrase LDA,the word vector pre trained by BERT(Bidirectional Encoder Representation from Transformers)is introduced as additional semantic knowledge to solve the problem of poor interpretation of term extraction results.Experiments show that the topic representation model based on term is better than the traditional LDA model,and the word vector pre trained by Bert is better than word2 vec in terms of semantic knowledge.
Keywords/Search Tags:Topic Mining, Term Extraction, Multi-Feature Fusion, Transformer Encoder, BERT
PDF Full Text Request
Related items