Font Size: a A A

Research On Topic Model Based Patent Mining And Its Applications

Posted on:2016-08-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H S ChenFull Text:PDF
GTID:1108330503453407Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Into the new century, the advance of science and technology of human society has come to an unprecedented new stage of rapid development, which leads to the spectacularly fast production and accumulation of patent documents. Under such circumstances, text mining for patent analysis is no longer a relatively isolated auxiliary module, but a quite significant part of technological decision making. It plays an important role of assisting in revealing potentially useful knowledge and supporting technology strategy formulation, which makes it is crucial to mine patent textual documents in a comparatively more accurate and efficient way. In the last decade, this topic has attracted a lot of attention in both public and private domains.This thesis is text mining oriented. Based on the literature review of study on patent text mining at home and abroad, and also the actual demand in applications, this research brings one of the state of the art topic models, Latent Dirichlet Allocation, to the context of patent analysis, and proposed a topic modeling-based patent text mining approach. It utilizes unsupervised learning to uncover the hidden technological topics underling in large volumes of textual patent claims. In addition, this research estimates the detailed developing trends of specific latent topics, instead of a broad technological area, from massive amounts of documents, using annual weight matrix and quadratic polynomial fitting. It also estimates the various levels of contribution that the identified topics made to the patenting activities trend turning of the whole target area, by using Piecewise Linear Representation, Least Squares method and other quantitative methodology. To demonstrate the effectiveness of this approach, this research then use utility patents from United States Patent and Trademark Office that have Australia as their assignee country in the past 15 years, for case studies.On the whole, the significant contributions of this thesis can be summarized as follows:(1) A framework based on topic model for patent text miningThis thesis brings one of the state of the art topic models, Latent Dirichlet Allocation, to the patent analysis and technology management field. It proposes a framework based on topic modeling for automatic unsupervised text mining of patent claims and develops a whole process of data input, text cleaning, hidden topic identification, topic contribution coefficient calculation and topic future trend estimation.(2) Automatic topic identification approach for patent claimsThis research discovers and reveals the latent semantic topics hidden in massive patent claims automatically using Latent Dirichlet Allocation with minimal human intervention, overcomes the limitation where a simple keyword-ranking system in traditional text mining is too general or misleading to indicate a concept, especially when polysemy exists. After textual data cleaning, underlying semantic topics hidden in large archives of patent claims are revealed in an unsupervised way.(3) Quantitative temporal trend turning points identificationThe study of trend estimation is an important part of patent analysis. Although using fitting models on patent counts can provide rough tendency of a technological area, the trend of the content within the area remains hidden. In addition, there are no obvious trend turning points in the prospective of technology life cycle, thus it‘s quite hard to connect the temporal attribute of patents with their semantic properties. This research propose an approach for quantitatively capturing temporal trend turning points using piecewise linear representation, to identifies and represents a number of trend turning points and trend segments from patenting activities of the target area.(4) Quantitative evaluation on how identified topics contribute to the trend changing of patenting activitiesThe identified latent topics have their very own trends and different contribution levels to the patenting activities of the whole area. After quantitatively presenting a number of trend turning points and trend segments from patenting activities of the target area, this research utilizes these outcomes to generate evaluate a sequence of topic contribution coefficients, to evaluate to what degree various topics have contributed to the patenting activity trend shifts of the whole area.(5) A topic trend estimation approach based on an annual weight matrixIn a real situation, a patent document collection actually associates with multiple underlying technological topics. After text cleaning, topic modeling and trend turning points identification, a topic annual-weight matrix is generated to quantitatively estimate the developing trend of discovered topics. In order to demonstrate the effectiveness of the approach, we present a case study using 13,910 utility patents that are owned by Australian assignees and were published during years 2000 to 2014 in the United States Patent and Trademark Office.The results indicate that our proposed approach effectively and efficiently generates hidden topics from massive claims, and estimates their very own trend and different contribution levels to the patenting activities. This research will provide valuable topicbased knowledge to facilitate further technological decision making or opportunity discovery.
Keywords/Search Tags:Text mining, Topic model, Patent claims, Patent analysis, Trend estimation, Latent Dirichlet Allocation
PDF Full Text Request
Related items