Font Size: a A A

Topic Extraction Andvisualization Of Patent Text

Posted on:2021-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:C H GuoFull Text:PDF
GTID:2518306560953199Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The topic model can mine the potential semantic information in the text and achieve a better representation of the text content.The topic extraction of patent texts helps patent analysts better understand the current state of technological development in a certain field.In recent years,topic models represented by LDA models have been widely used in text mining tasks while deep learning continues Development,word vectors have gradually occupied an important position in the study of natural language analysis.Researchers have improved the topic model according to different needs.Enhance the interpretability of topic extraction results by constructing thesaurus of thesaurus,add the characteristic features of facial expressions specific to Weibo for sentiment analysis of Weibo content,increase the topic extraction parameters,and improve the accuracy of topic sentiment analysis.Variables to improve the accuracy of mining user interest.There is a similar place in many studies.Our research is mainly focused on the topic extraction of fixed patent texts.Development has gradually become a research hotspot.Under the time factor,the research and analysis of the subject has become an inevitable requirement for the rapid development of today.According to the needs and characteristics of the development of the times,this article focuses on the topic extraction of patent texts affected by time factors.The construction of the model realizes the multi-dimensional display of topic information through visualization,so that the topic information can be seen at a glance.The main work of the whole process is as follows:(1)According to the characteristics of patent texts,a custom dictionary and part-of-speech selection themed text processing are added during data preprocessing to ensure the subjectivity of the data and the integrity of the word segmentation.A topic model that incorporates time and semantic factors is proposed.The subject text analysis of the patent text information is used to obtain a patent subject text cluster.The subject word extraction algorithm is performed for each subject text to obtain the subject words required for the final draft of this article to ensure the temporal and semantic characteristics of the subject words.(2)In this paper,MA based hot topic words method and KDJ based new topic words method are proposed.From the mathematical point of view,the topic words extracted from the topic model are analyzed and screened by using statistics.The hot topic words and new topic words that meet the determination requirements are obtained through analysis and screening.They are recommended in the way of word cloud map.At the same time,the topic can be seen by sliding the time axis The changing characteristics of words.In this paper,the Patent Texts downloaded from patent retrieval websites are used as experimental data,and F-measure(F1),recall(R)and precision(P)are used as evaluation indexes.By comparing the traditional LDA model,LDA and Doc2 vec fusion model based on time factor and semantic enhancement,the results show that the theme model proposed in this paper effectively solves the hot spots under the condition of time change The problem of topic word judgment and new topic word judgment improves the value of the topic,and the topic judgment in the current time range is completed by the final topic.
Keywords/Search Tags:Patent text, Topicalization, Doc2vec, Topic model, Hot topic words, New theme words
PDF Full Text Request
Related items