Font Size: a A A

Comparative Research Of Topic Discovery Methods In Patents Documents

Posted on:2015-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:L F JiaFull Text:PDF
GTID:2309330467480416Subject:Science and technology management
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, data volumes are growing increasingly in all the industries. How to screen out valuable information efficiently from a large number of data around us has become a challenging task in front of us. In the field of knowledge management, it has been a hot spot to monitor the topic and development trend. In topic mining task, different scholars may choose different topic detection methods for the same question. The common topic detection methods include co-word analysis, k-means and the recently popular LDA model. For different research aims, the question which topic analysis method is more proper has been a hot concerned subject by many scholars in knowledge management study.Considering that few scholars studied the difference between the three methods of topic detection thoroughly. This paper makes a particular comparison among applicability and effectiveness of the three methods from the angle of theory and practice application. Through a lots of literature investigation, we fully understand and study the co-word clustering analysis, k-means algorithm and LDA topic model. In this paper, the second chapter gives a detailed introduction about the principle, hypothesis and use flow of three method. The third chapter makes a comparison about applicability from the angle of data type and scope of application. The forth chapter studies the advantage and defect of every method. In the fifth chapter, we provide an Chinese patent documents analysis in auto parts field based on co-word and LDA model respectively.Results confirmed our inference that among three method, LDA model outperformed co-word and k-means in the Chinese patent analysis. In the case of the auto parts, LDA got the complete theme, and co-word analysis only found some hot topics. Finally, this paper concluded that co-word analysis has a good advantage in the field of discipline hot spot analysis, and poorly in intellectual structure and development evolution. K-means is fit for large-scale text clustering, not fit for topic detection, because k-means is difficult to get the cluster title. LDA has a extensive application, and is fit for topic discovery in publication data, patent documents and web data.
Keywords/Search Tags:co-word, k-means, LDA, patent
PDF Full Text Request
Related items