Font Size: a A A

Research On Patent Text Classification And Evolution Based On LDA Model

Posted on:2018-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:F G LeFull Text:PDF
GTID:2348330518461752Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Patent literature is the carrier of technical information,its text hides a lot of technical information,technical information is the best source of information.With the rapid development of new China,the number of patent applications in China has been increasing year by year,to 2016 has been the fifth consecutive year reelected the first global patent applications.Therefore,for these massive patent literature research and development of information mining technology,has become a national and corporate research common hot spots.LDA model is a typical probability topic model,has been widely used in natural language processing,data mining and artificial intelligence and other fields,used to analyze the classification and evolution of the text.In this paper,based on the existing patent text information mining technology framework,the LDA model is used to classify and analyze the patent text.The main contents of this paper are as follows:(1)First,we summarize several traditional probabilistic thematic models and give them a brief description.Then we describe the LDA model applied in this paper,introduce its related mathematical probability distribution and parameter estimation algorithm,and finally review the patent text Some typical classification algorithms and evolutionary analysis methods.(2)Aiming at the problem of textual representation of vector space model,a patent text classification method based on LDA model is proposed in the traditional patent text automatic classification method.This paper uses the LDA topic model to model the patent text corpus,extract the document of the patented text-topic and topic-the lexical matrix,the semantic relation between the dimension reduction and the extracted document,and introduce the class-topic matrix of the class topic semantic extension,use topic similarity to construct hierarchical classification,and subclasses use KNN classification method.Experimental results: Compared with the KNN patent text classification method based on vector space text representation model,this method can obtain a higher classification evaluation index.(3)The use of probability topic model a comprehensive study of patent literature topic evolution,found patent technology trends.The LDA model is used to model the patent text in the time window,and the optimal topic is determined by the confusion.The topic vector is extracted according to the structure of the patent text.The association between the topics of the SI scatter metric is introduced,and the technical strength of the IPC classification number is introduced.Strength,topic content and technical topic strength of three aspects of the evolution of research.The experimental results show that this method can analyze the evolution law and trend of patent technology with time.The method can deeply dig the topic of patent literature to help the relevant practitioners understand the evolution of patent technology and trends.
Keywords/Search Tags:Probability topic model, LDA, patent literature, Text classification, Topic evolution
PDF Full Text Request
Related items