Font Size: a A A

Research On Unsupervised Keyword Extraction Of Patent Field

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2428330614958476Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the progress of science and technology and the development of human civilization,patent text has attracted more and more scholars' interest as an objective form of protecting intellectual property rights of researchers.Patent text has the characteristics of complex text structure,long length and strong professionalism.As a highly generalized vocabulary of patent text,the key words of patent text can help readers quickly locate the patent text and understand the main content of patent text.It can be seen that keyword extraction of patent text is an important and basic work.Most of the existing keyword extraction systems run slowly,the readability of extraction results is poor,and it is easy to cause keyword redundancy.In order to improve the accuracy of patent keyword extraction results,this thesis focuses on the application of unsupervised keyword extraction algorithm in patent text.The main research contents are two sepects follows:The first aspect: Aiming at the problem that the unsupervised method in the general field is applied to the field of patent text,the accuracy of patent keyword extraction results is low.A text embedding keyword extraction method based on patent element constraints is proposed,which can extract important keywords directly from the patent text..Experimental results show that the accuracy and recall rate of this method are higher than the traditional TF-IDF keyword extraction method,Text Rank keyword extraction method and LDA keyword extraction method.The second aspect: Aiming at the problem that the current topic model is not high in the quality of keyword extraction tasks for patent text,a keyword extraction method based on patent elements in LDA topic model is proposed,which combines patent elements with candidate keywords extracted by LDA,and uses Borda sorting algorithm to get high-quality keywords.Combined with published patents to verify the algorithm,the results show that the newly proposed extraction method has a better performance than the single LDA and LSI extraction algorithm.This thesis combines the patent elements to extract the keywords in the patent text,respectively improves the unsupervised text embedded keyword extraction method and LDA topic model algorithm,and improves the accuracy and diversity of keyword extraction from different angles.
Keywords/Search Tags:unsupervised, keywords, patent elements, text embedding technology, topic model
PDF Full Text Request
Related items